vert-x3 / vertx-ignite

Apache License 2.0
35 stars 28 forks source link

Ignite cache is closed when node is out of topology or network failures #43

Closed bdarwin closed 6 years ago

bdarwin commented 7 years ago

When there is a network issue and a node can't reach another node for quite sometime, i see the below error and cache is closed.

2017-02-22 04:14:58.922 vert.x-worker-thread-1 INFO ignite.IgniteClusterManager - javax.cache.CacheException: class org.apache.ignite.IgniteCheckedException: Failed to wait for topology update, cache (or node) is stopping.

Once cache is closed the node is literally dead because it cant get the vertx sub maps and can't communicate anymore, I have to restart the node.

Is there something obvious I am missing here? or this is how it works?

I found below open tickets, not exactly what I am looking for but sounds similar.

https://issues.apache.org/jira/plugins/servlet/mobile#issue/IGNITE-2766 https://issues.apache.org/jira/plugins/servlet/mobile#issue/IGNITE-3616

This can be easily reproducible by blocking any event bus thread by say few minutes.

agura commented 7 years ago

I don't understand how are closed cache and network issue related. Could you please provide more details (logs, steps to reproduce)? Ideally to have a reproducer.

bdarwin commented 7 years ago

It so happened that when there was a network outage I saw this message in one of the the node logs and cache is closed. May be it's not related to network at all. But what does the this exception mean? Why node is stopping if it can't get topology update? I will get a sample soon.

agura commented 7 years ago

I think that node was stopped due to network segmentation. You can turn on logs on DEBUG level for org.apache.ignite.spi.discovery.tcp package and grep for SEGMENTED pattern.