Open andresgomezfrr opened 9 years ago
Thanks for detailed instructions. We will try to reproduce and get back to you.
Any update?
Can you try increasing network timeout as suggested by the exception? Default is 4000ms, so I would recommend setting it to 10000ms to give it enough time to deal with 40% packet loss.
If that does not help, we will need to take a look at the thread dumps from each node.
Also, please make sure that you are running on 6.6.2 version.
Hi all,
I have detected some problem when gridgain thrown networkTimeout exception, and I can simulate it, if you follow next steps:
I build a sample gridgain client that put and get randoms K/V objects on a grid cache. My example store a object on the cache and after 100 milliseconds it queries this object.
The example's source is available on this gist: https://gist.github.com/andresgomez92/f3bf78682acaecc8cde6
When client is running, you can see some like this:
While my client is running, I enable the packets loss simulation using this command:
I know that 40% of lost packets is maybe high, but this isn't the problem ... when you enable the packet loss, you can see how the client is getting slower, and if you wait some minutes you get this exception:
When this happen my client and java example hang up, now if I disable packet loss using this command:
My gridgain node works fine, I can check my K/V objects using ggvisorcmd.sh, if I disable my node I can see how my gridgain client detects it, like this:
But my gridgain client can't write and query K/V objects again, he is hang up ...
I think that when the gridgain throw org.gridgain.grid.cache.GridCacheAtomicUpdateTimeoutException, the client must give me a null, like if it doesn't find the specific key, and it must continue working normally.