uber-node / ringpop-node

Scalable, fault-tolerant application-layer sharding for Node.js applications
http://uber.github.io/ringpop/
MIT License
1.18k stars 146 forks source link

Partition healing #264

Closed mennopruijssers closed 8 years ago

mennopruijssers commented 8 years ago

This PR adds partition healing to ringpop-node. This initial PR is only healing manually. Periodic partition healing will be added in a subsequent PR.

mennopruijssers commented 8 years ago

Currently based on a "special" branch: master merged with #260 and #259 based on master now.

mennopruijssers commented 8 years ago

Manual test:

# create 25 node cluster
./scripts/tick-cluster.js -n 25 --interpreter node main.js

# validate no partitions:
ringpop-admin partitions $IP:3010
 08:59:18.496   Checksum    # Nodes   # Alive   # Suspect   # Faulty   Sample Host
                140365291   25        25        0           0          $IP:3010

# setup firewall using make_partition-script in ringpop-common:
sudo lsof -Pnni | ./make_partition 3000-3010 3011-3024  | sudo pfctl -emf -

# reset firewall after a while:
sudo pfctl -f /etc/pf.conf

# validate two partitions:
ringpop-admin partitions $IP:3010
 09:01:09.012   Checksum    # Nodes   # Alive   # Suspect   # Faulty   Sample Host
                228740776   14        14        0           11         $IP:3011
                883647694   11        11        0           14         $IP:3010

# run heal:
ringpop-admin heal $IP:3000
09:02:11.273 Executed heal to 1 targets
09:02:11.273  - $IP:3011
09:02:12.285 Executed heal to 1 targets
09:02:12.285  - $IP:3022
09:02:13.286 No known partitions left

# validate no partitions:
ringpop-admin partitions $IP:3010
 09:02:14.495   Checksum     # Nodes   # Alive   # Suspect   # Faulty   Sample Host
                1366565648   25        25        0           0          $IP:3010
motiejus commented 8 years ago

Please make sure to run integration tests in travis-ci too. Appending --enable partition-healing to the it-tests.js command should do the trick here: https://github.com/uber/ringpop-node/blob/dc57cbef6a8b377daa9d24f43b25fdc20be6a5c4/test/run-shared-integration-tests#L81

motiejus commented 8 years ago

LGTM

thanodnl commented 8 years ago

LGTM, but make sure tests pass :)