Closed mennopruijssers closed 8 years ago
Currently based on a "special" branch: master merged with #260 and #259 based on master now.
Manual test:
# create 25 node cluster
./scripts/tick-cluster.js -n 25 --interpreter node main.js
# validate no partitions:
ringpop-admin partitions $IP:3010
08:59:18.496 Checksum # Nodes # Alive # Suspect # Faulty Sample Host
140365291 25 25 0 0 $IP:3010
# setup firewall using make_partition-script in ringpop-common:
sudo lsof -Pnni | ./make_partition 3000-3010 3011-3024 | sudo pfctl -emf -
# reset firewall after a while:
sudo pfctl -f /etc/pf.conf
# validate two partitions:
ringpop-admin partitions $IP:3010
09:01:09.012 Checksum # Nodes # Alive # Suspect # Faulty Sample Host
228740776 14 14 0 11 $IP:3011
883647694 11 11 0 14 $IP:3010
# run heal:
ringpop-admin heal $IP:3000
09:02:11.273 Executed heal to 1 targets
09:02:11.273 - $IP:3011
09:02:12.285 Executed heal to 1 targets
09:02:12.285 - $IP:3022
09:02:13.286 No known partitions left
# validate no partitions:
ringpop-admin partitions $IP:3010
09:02:14.495 Checksum # Nodes # Alive # Suspect # Faulty Sample Host
1366565648 25 25 0 0 $IP:3010
Please make sure to run integration tests in travis-ci too. Appending --enable partition-healing
to the it-tests.js
command should do the trick here: https://github.com/uber/ringpop-node/blob/dc57cbef6a8b377daa9d24f43b25fdc20be6a5c4/test/run-shared-integration-tests#L81
LGTM
LGTM, but make sure tests pass :)
This PR adds partition healing to ringpop-node. This initial PR is only healing manually. Periodic partition healing will be added in a subsequent PR.