zeromq / zyre

Zyre - an open-source framework for proximity-based peer-to-peer applications
Mozilla Public License 2.0
878 stars 174 forks source link

Problem : In gossip mode, when a peer enters the network after another peer exited and came back on same endpoint (with another uuid), the new node receives an endless loop of "ENTER" and "EXIT" #697

Closed roumieu closed 3 years ago

roumieu commented 3 years ago

Solution : PUBLISH zyre nodes to gossip with their endpoints as tuple key and not their UUIDs so that tuples are properly stored in gossip tables.

By storing zyre nodes with their UUIDs as tuple keys, when a node re-enters the network, the other nodes will store 2 tuples in their gossip table. They keep the old one because the tuple key (UUID) is different. A new zyre node will then try to discover 2 peers with different UUID but the same endpoint which is incorrect and impossible.

By storing zyre nodes with their endpoints as tuple keys, other nodes will only store the last node that entered the network on a specific endpoint. Then, when a new node enters the network, it will try to discover only the last node for this endpoint, which the one currently alive.

In addition, publishing gossip tuples with an endpoint as a key corresponds to how tuples are stored in czmq zgossip tests

bluca commented 3 years ago

@roumieu please fix your workflows for your next PRs - do not do recursive merges, but instead rebase your branches. This is how the master branch looks after this merge:

*   29ce5ca E - (5 minutes ago) Merge pull request #697 from roumieu/master - Kevin Sapper (HEAD -> master, upstream/master)
|\  
| * 570eae1 N - (20 hours ago) In gossip discovery mode, PUBLISH zyre node to gossip table with its endpoint as tuple key and no more its uuid so that when a node start on same endpoint as a previous one that is gone (with a new uuid), gossip tuples are stored correctly - chloe
| *   a05067a N - (20 hours ago) Merge branch 'master' of https://github.com/zeromq/zyre - chloe
| |\  
| |/  
|/|   
* |   b6ab7ce E - (21 hours ago) Merge pull request #696 from Mathsoum/master - Kevin Sapper
|\ \  
| * | c6a85fb N - (25 hours ago) Problem: Election do not reset when a peers leave while the election is in progress Solution: Start a new election when a peer leaves, even is one is still in progress - Mathieu Soum
* | |   b26891d E - (24 hours ago) Merge pull request #695 from Mathsoum/sync_project_xml - Luca Boccassi
|\ \ \  
| |/ /  
|/| |   
| * | 184a733 N - (25 hours ago) Problem: out of date with project.xml Solution: regenerate - Mathieu Soum
|/ /  
| * f8eb7a9 N - (3 weeks ago) Fix test to UNPUBLISH a tuple if we are in gossip mode - chloe
|/  
*   1e1f6ee E - (8 weeks ago) Merge pull request #694 from bluca/regen - 6R4N

that is wrong, you can do git config --global pull.rebase true to set the right default.

@sappo if you see recursive merges as commits in a PR, please do a squash-merge