weaveworks / weave

Simple, resilient multi-host containers networking and more.
https://www.weave.works
Apache License 2.0
6.62k stars 670 forks source link

Decommissioned node can hijack IPAM pool tokens #3441

Open berlic opened 6 years ago

berlic commented 6 years ago

What you expected to happen?

Host that was removed from cluster with rmpeer + forget should not be able to hijack it's former IPAM tokens back from cluster.

What happened?

Decommissioned host takes it's former tokens back from cluster making IPAM status inconsistent forever.

How to reproduce it?

  1. Create weavenet cluster of 5 nodes (A, B, C, D, E).
  2. Create several containers in weave network on every node.
  3. Stop any node, let's say B.
  4. Execute rmpeer B on node C and forget B on nodes A, C, D, E.
  5. Check that IPAM token of node B is now owned by node C and has it's Version incremented a bit.
  6. Check that IPAM table is consistent on every node with node C as owner of IPAM token in question.
  7. Start node B and start containers in weave network (the more containers you have in weave network, the quicker the hijack will happen). Each container start increases Version of IPAM token.
  8. Now at the moment when Version of IPAM token on resurrected node B becomes grater than Version of this token on node C, all other nodes in cluster (A, D, E) update their IPAM tables with node B as owner of IPAM token in question.
  9. IPAM table is inconsistent, because node C will never release it's own token.

Anything else we need to know?

weave 2.3.0, used as "legacy" plugin without swarm.

murali-reddy commented 6 years ago

@berlic thanks for providing details steps to reproduce the issue. There were some critical issues related IPAM that got fixed in both 2.4.0 and 2.5.0. I will try to reproduce if this is still issue.

berlic commented 4 years ago

This change https://github.com/weaveworks/weave/commit/873781ab580773c17c573383bd77fd709207a3e6 makes this issue less likely to happen, but still possible.