helium / plumtree

Epidemic Broadcast Trees
Apache License 2.0
193 stars 51 forks source link

Leave cluster issue #39

Open larshesel opened 7 years ago

larshesel commented 7 years ago

I have a two node plumtree cluster p1@127.0.0.1 and p2@127.0.0.1 and I then execute plumtree_peer_service:leave([]) on p1 and that erlang node terminates. Afterwards I still get this in the logs of p2@127.0.0.1:

08:06:57.917 [debug] started plumtree_metadata_manager exchange with 'p1@127.0.0.1' (<0.370.0>)
08:07:07.918 [debug] started plumtree_metadata_manager exchange with 'p1@127.0.0.1' (<0.376.0>)
08:07:17.919 [debug] started plumtree_metadata_manager exchange with 'p1@127.0.0.1' (<0.381.0>)
08:07:27.920 [debug] started plumtree_metadata_manager exchange with 'p1@127.0.0.1' (<0.386.0>)
08:07:37.921 [debug] started plumtree_metadata_manager exchange with 'p1@127.0.0.1' (<0.391.0>)
08:07:47.922 [debug] started plumtree_metadata_manager exchange with 'p1@127.0.0.1' (<0.396.0>)

So it seems some state is not properly cleaned up. I believe the problem is in the update callback of plumtree_broadcast: https://github.com/helium/plumtree/blob/master/src/plumtree_broadcast.erl#L278. It seems to only set all_members state field to CurrentMembers if there are new cluster members, not if any has been removed, which I think is an error. The all_members state field will have a reference to p1@127.0.0.1 until either p1 comes back or some other node joins.

If this analysis seems correct, I'll be happy to make a PR for this.