Hazelcast cluster implementation try send event to died node

SergeyMagid commented 8 years ago

Hi All,

I have a problem with Hazelcast cluster manager impl.

After I kill several nodes in clustering Vertx still tried send event to died nodes. To fix this I should restart all my nodes in cluster.

My reproducer with description how reproduce them : https://github.com/SergeyMagid/vertx-reproducer I used there vertx-hazelcast:3.1.0-SNAPSHOT. With 3.0.0 I have the same problem. I get this problem with aws and multicast join config.

Also I was tried write JUnite test. But After I kill several nodes my died nodes still receive event. Maybe its problem because kill is simulated? But its work as expected if I kill one node. https://github.com/SergeyMagid/vertx-hazelcast/blob/master/src/test/java/io/vertx/test/core/HazelcastClusteredEventbusTest.java

And I found that this callback https://github.com/vert-x3/vertx-hazelcast/blob/master/src/main/java/io/vertx/spi/cluster/hazelcast/impl/HazelcastAsyncMultiMap.java#L145 do not called from hazelcast. So cache can be corrupted. But in ClusterManager callback memberRemoved https://github.com/vert-x3/vertx-hazelcast/blob/master/src/main/java/io/vertx/spi/cluster/hazelcast/HazelcastClusterManager.java#L225 calls every time.

vietj commented 8 years ago

going to investigate, will let you know

SergeyMagid commented 8 years ago

Hi

Maybe it's can help you. I added some logs inside HAManager and was surpised.. I have added logs here: https://github.com/eclipse/vert.x/blob/master/src/main/java/io/vertx/core/impl/HAManager.java#L147 to check is clusterMap contains correct data. There was all correct data of my free nodes.

But when called HAManager.this.nodeLeft(leftNodeID); This UUID was missed in this map.. (time-to-time)..

Is that mean if I use several services on one node(i.e. one virtual machines) they connected to one master Hazelcast? Or HZ not merged map immediately to other hazelcast instances?

Should I use own container for each Vertx service?

vietj commented 8 years ago

you are right, the problem is that there are not enough node to backup the cluster topology and this result in incorrect topology in your node when 2 nodes of 3 are killed.

here is what happens: you have 3 nodes, each clusterMap is backed by a single node. So the main node backs only one of the other two nodes. When you kill these two nodes, the main node only has info about one of the two nodes.

you can solve this by increasing the number of backup nodes in cluster-default.xml for the __vertx.haInfo distributed map, for instance with 3 nodes:

<map name="__vertx.haInfo">
  <backup-count>2</backup-count>
</map>

that should solve the problem.

SergeyMagid commented 8 years ago

thanx,

I have set backup to 2. It`s does not help. But when I set backup to 3 all start worked as expected.. Will checking at demo server with more than 7 services is this changes helped..

SergeyMagid commented 8 years ago

ups.. it reproduced again even with even backup-count = 3...

To clear. When I use several Vertx services in one node - it is incorrect for failsafe, right? If I will use three VM where each has several Vertx services inside, is that help me?

vietj commented 8 years ago

I will check what happens with backup=3

purplefox commented 8 years ago

Any updates on this?

vincentwuhrlin commented 8 years ago

Hello,

don't know if it's the same bug, but it's a very blocking problem to go to production with Vertx...

We use EventBus for microservices and when an instance poweroff, the message are send to died instances. Very problematic with Kubernetes architecture when you scale up / down instances.

Informations

Vertx version : 3.2.0
Hazelcast verion : 3.5.2

Configuration

Files are in this repository : GitHub repository
Download Hazelcast 3.5.2 binary : Download link
Replace hazelcast.xml with file from test repository in bin directory
Use Client.class in a IDE from test repository (Eclipse, IntelliJ)

Procedure

Execute 4 server instances by executing bin/server.sh from Hazelcast binary distribution
Launch 4 instances of Client with your IDE
Kill the first 3 clients (for example with stop button in IDE) and keep the fourth alive
Wait some seconds for Hazelcast reconfiguration
After some seconds, the last client alive (fourth) don't receive message every second. This is the problem !

Results

All is fine when you start a "clean" cluster from scratch and you launch multiples instances of Client. You can receive messages from multiples instances with use of EventBus, and each message is only sent to one load balanced instance (thanks to awesome Vertx EventBus send method !)
When multiple Client instances are killed, I have the feeling that the message is load balanced and sent to died instances. In this case, we have killed 3 instances, the last instance receive itself message every 4 seconds, and not every second (the 3 other messages are sent to the 3 died instances ?).
After this bug, when you kill all client instances and restart just once, the instance dont receive itself message every second. You must restart the entire Hazelcast server cluster (the 4 server instances) for recover a normal situation.

Questions

Can you confirm this bug ?
Someone have similar problems ?
If it's a bug, is a patch planned in next release of Vertx ? (this is a very blocking problem for my project).

Thank you very much for you help !

purplefox commented 8 years ago

Please can you post your project somewhere? (GitHub is preferable). It is much easier to debug that way.

Also... your zip is missing your server, and you seem to be programmatically overriding the hazelcast config in your client.java.

vincentwuhrlin commented 8 years ago

Hello,

thank you for your response, i have created a github for my test. If you have needs some specifics logs or test cases, please tell me. This bug is my priority ;) !

Sincerely, Vincent

cescoffier commented 8 years ago

We would need the server code too.

vincentwuhrlin commented 8 years ago

No specific server code, the class Client.java send and consume messages on a EventBus channel (I have called this class Client because it's connect to an hazelcas cluster).

purplefox commented 8 years ago

If there is no server can you explain what you mean by "Execute multiple server instances (3 or 4)" ?

purplefox commented 8 years ago

Also can you be a bit more specific about exactly how many instances to run and exactly what to kill and in what order?

vincentwuhrlin commented 8 years ago

A "server instance" is an official Hazelcast distribution instance, simply started with official startup script (server.bat / server.sh). I will modify my post to be more precise.

purplefox commented 8 years ago

I don't understand what relevance standalone HZ instances are. Vert.x doesn't require any - each Vert.x instance has a HZ node built-in.

vincentwuhrlin commented 8 years ago

I know, but i have the same bug with HZ node built-in > I can make a test for that if you want

However :

It must work with an offical Hazelcast cluster, isn't it ? (Vertx has specific code in HZ node built-in ?)
Hazelcast best practices (from Hazelcast guideline) is to have a real cluster of machines (not embedded).
It is more simple in a Kubernetes production environment to have a dedicated Hazelcast cluster and give the endpoint to vertx applications.

purplefox commented 8 years ago

That's not the way Vert.x works. Vert.x does not require any other HZ nodes.

purplefox commented 8 years ago

If you would like to see how to deploy clustered Vert.x nodes in a k8s cluster take a look here:

https://github.com/fabric8io/ipaas-quickstarts/pull/1098

SergeyMagid commented 8 years ago

As I understand vincentwuhrlin use one external Hazelcast instead of use embedded in each vertx. (And as result this should be more stable for this situation than use embedded. Because when several vertx services died map at HZ will corrupted in 90% cases )

But problem the same. If several nodes dies at one moment then map at Hazelcast will corrupt. i.e. hazelcast can still contain row 'MyDiedVerticle:45531'

PS We started to use Redis for messaging between vertx services.

vincentwuhrlin commented 8 years ago

Thanks Purplefox for your link, i will study this.

But i have realized the test with embedded and same result.. I will work on that today and come back.

vincentwuhrlin commented 8 years ago

So Redis seems to be more stable to use cluster messaging ?

purplefox commented 8 years ago

If the problem occurs without the standalone HZ servers then why include them in the reproducer?

It always makes sense to create the simplest possible reproducer as it's much easier to understand.

purplefox commented 8 years ago

It also appears you are overridding the HZ config programmatically in your client class and not using the Vert.x cluster.xml (this may cause problems).

vincentwuhrlin commented 8 years ago

Ok, so how to use cluster.xml in kubernetes dynamic environnement ? Programmatically configuration is mandatory isn'it ? (no multicast please)

I have use standalone HZ servers because it's a best practice from Hazelcast. So, now i have your confirmation that this architecture is not supported by Vertx (i have read anything about that on official documentation), i will switch to embedded and test.

purplefox commented 8 years ago

There is also a general point (which I have discussed a few times on the google group).

If you have a distributed system and your data is backed up to N nodes, then you can't kill more than N - 1 nodes concurrently otherwise you risk data loss. This is not a Vert.x or Hazelcast issue it's just the laws of physics (and mathematics) :)

So you may need to increase number of backups in the HZ config if you want to cope with concurrent failures. Of course there is a trade-off here between reliability and performance.

purplefox commented 8 years ago

Take a look here:

https://github.com/purplefox/ipaas-quickstarts/tree/master/quickstart/vertx/eventbus

Just put your cluster.xml in src/main/resources - it will end up in the fatjar which goes in the docker container.

You don't need to programmatically configure the cluster.

SergeyMagid commented 8 years ago

Sorry, but Is this will be work if I have several docker-hosts without configure one network for them?

purplefox commented 8 years ago

The nodes are visible throughout the k8s cluster.

hsch commented 8 years ago

I think I'm running into the same problem as @vincentwuhrlin. I've build a very simple demo application with instructions to reproduce my (this?) problem: https://github.com/hsch/vertx-hazelcast-issue-13

@purplefox, I would appreciate it if you could have a look and see if that helps to move this defect forward. I'd like to understand if I'm missing something or if there is a serious problem with vert.x/hazelcast.

purplefox commented 8 years ago

@hsch Please see my comment from 15 Jan starting "There is also a general point..."

hsch commented 8 years ago

@purplefox, sorry for making you repeat yourself. You're saying "you can't kill more than N - 1 nodes", right?

Two questions about that:

Does that mean that the cluster won't be ever able to recover from such a situation, even if it goes back to its original size? I'll accept the laws of math, but it would still be pretty sad. :)
In my scenario, I have N = 4 nodes (1 publisher, 3 consumers), all in one cluster, right? So shouldn't that mean that I "can't kill more than 3 nodes"? I'm killing only 2 consumers, i.e. less than 3. Isn't that in the acceptable range? Even if you meant "backed up to N nodes", my backup should have been in 3 nodes, making it okay to kill 2.

purplefox commented 8 years ago

Yes, if the data is replicated on to N nodes then you can't kill more than N - 1 nodes concurrently as there is a possibility that the nodes you kill are the ones that have the data you need. That's just a general principle of distributed systems.

The default number of backups in the hazelcast cast config is 1 which means the data is on 2 nodes (the original and the backup).

As far as I can tell, you haven't changed the HZ config so N = 2, and in your test you are killing two nodes at the same time so it seems like expected behaviour to get some data loss here.

So I don't see an issue here currently but If you still have issues after increasing HZ # backups > then this needs more investigation.

hsch commented 8 years ago

Appreciate your patience!

So N = original + #backups. I created a cluster.xml based on [1] with backup-count = 2, see [2]. I was not able to confirm from the logs (even with DEBUG) that that setting was actually applied, but logging itself switched from JUL to SLF4J, so I'm confident that the XML is in the right place.

I think I should be allowed to kill N - 1 = 3 - 1 = 2 nodes simultaneously now. Well, I can confirm that it is totally fine to slowly scale consumers up to 3 and down to 0 even (leaving only the publisher), scale up again, and everything works as expected.

However, killing 2 consumers at about the same time still causes delivery failures in the publisher.

[1] https://github.com/vert-x3/vertx-hazelcast/blob/master/src/main/asciidoc/java/index.adoc [2] hsch/vertx-hazelcast-issue-13@f36053d155a32f1c4d06851ea9d7c0ac7a16434d

hsch commented 8 years ago

Update: I found Hazelcast's JMX and identified three more entities with a backup-count setting, vertx.*, vertx.haInfo, and default (a multi-map). I've set all their backup-counts to 2 as well (and this time I was able to confirm that it's being applied, thanks to JMX), but the problem still occurs. :confused:

[3] hsch/vertx-hazelcast-issue-13@4351395b00243dbc3a73549fde4c337a7a1be059

purplefox commented 8 years ago

Thanks. This needs investigation but possibly a Hazelcast issue. Does this occur in the latest version of HZ? I know there were bugs in earlier versions related to losing data when multiple nodes were concurrently shutdown.

hsch commented 8 years ago

It did not help to update to Hazelcast 3.6.1.

Also, the HZ state at least seems to be okay when inspected via JMX: /com.hazelcast/HazelcastInstance/_hzInstance_1_dev.Members contains 2 remaining entries (1 publisher, 1 consumer) after the other 2 consumers have been terminated.

pontushellgren commented 7 years ago

I've also experienced this same problem when killing multiple nodes at the same time using kill -9.

I used the reproducer by hsch, only changing to use vert.x 3.3.2 and had no trouble reproducing this on MacOSX. Adding lots of logging to HAManager and other classes I think/hope I have a fairly good understanding of what causes the issue.

This is an example scenario I see happening:

Four nodes are started, the publisher and three consumers, N1,N2,N3,N4
N3 and N4 are killed (through kill -9)
HAManager.nodeLeft() is called on N1 saying N3 has left
checkRemoveSubs (L274) will remove the lost addresses (event consumers) for N3 in the hazelcast subs map
checkFailover (L275) will remove node id N3 in the hazelcast clusterMap map
checkFailover (L284) will then remove the other lost node id (N4) in the clusterMap map
HAManager.nodeLeft() is now called on N1 saying N4 has left
checkRemoveSubs (L274) will now NOT remove the lost addresses (consumers) for N4 as N4 is not in the clusterMap anymore (see bullet 6)

This means the consumers are left dangling in the subs map until cluster is brought down in total.

I made a simple change to the HAManager class so that step 6) above also removes the event consumers. With this change the reproducer no longer fails.

diff --git a/src/main/java/io/vertx/core/impl/HAManager.java b/src/main/java/io/vertx/core/impl/HAManager.java
index 1e5f7ae..d8bf002 100644
--- a/src/main/java/io/vertx/core/impl/HAManager.java
+++ b/src/main/java/io/vertx/core/impl/HAManager.java
@@ -281,6 +281,8 @@ public class HAManager {

       for (Map.Entry<String, String> entry: clusterMap.entrySet()) {
         if (!leftNodeID.equals(entry.getKey()) && !nodes.contains(entry.getKey())) {
+          JsonObject haInfo = new JsonObject(entry.getValue());
+          checkRemoveSubs(entry.getKey(), haInfo);
           checkFailover(entry.getKey(), new JsonObject(entry.getValue()));
         }
       }

polipodi commented 7 years ago

I still experience the same issue even with this fix. I've suggested a different way of managing the died nodes, can someone have a look at it ? The pull request is here: https://github.com/eclipse/vert.x/pull/1594 Thanks

tsegismont commented 7 years ago

Fixed by eclipse/vert.x#1848 and #59

vert-x3 / vertx-hazelcast