relab / hotstuff

MIT License
172 stars 53 forks source link

Reproduce safety error in Fast-HotStuff with Twins #55

Closed johningve closed 2 years ago

johningve commented 2 years ago

This PR adds a test case showing that our Twins implementation can detect the safety violation in Fast-HotStuff that was found in the Twins paper.

The main problem I encountered is this:

Because our view synchronizer requires a timeout certificate created from a quorum of timeout messages, All nodes become stuck because they cannot gather a quorum of timeout messages:

At round 5, node B timeouts. It is prevented from sending timeout messages to the other nodes due to the firewall of round 5. At round 6, nodes A and D timeout. They are prevented from communicating with node B due to the firewall of round 6. At round 7, node C timeouts. It is prevented from communicating with the other nodes due to the firewall of round 7.

Several changes were made to the Twins executor to get it working:

The test scenario results in the following commits, where node 2 disagrees with all other nodes at height 4:

fhsbug_test.go:123: Node r1n1 commits: 
                 Proposer: 1, View: 1, Hash: TNPUwB
                 Proposer: 1, View: 2, Hash: /kQdhx
                 Proposer: 1, View: 3, Hash: 9GFYkx
                 Proposer: 1, View: 6, Hash: PgtGrf

    fhsbug_test.go:123: Node r2n2 commits: 
                 Proposer: 1, View: 1, Hash: TNPUwB
                 Proposer: 1, View: 2, Hash: /kQdhx
                 Proposer: 1, View: 3, Hash: 9GFYkx
                 Proposer: 1, View: 4, Hash: aRlvqx

    fhsbug_test.go:123: Node r3n3 commits: 
                 Proposer: 1, View: 1, Hash: TNPUwB
                 Proposer: 1, View: 2, Hash: /kQdhx
                 Proposer: 1, View: 3, Hash: 9GFYkx
                 Proposer: 1, View: 6, Hash: PgtGrf

    fhsbug_test.go:123: Node r4n4 commits: 
                 Proposer: 1, View: 1, Hash: TNPUwB
                 Proposer: 1, View: 2, Hash: /kQdhx
                 Proposer: 1, View: 3, Hash: 9GFYkx
                 Proposer: 1, View: 6, Hash: PgtGrf
codecov-commenter commented 2 years ago

Codecov Report

Merging #55 (bde5359) into master (3e72795) will increase coverage by 1.25%. The diff coverage is 91.47%.

@@            Coverage Diff             @@
##           master      #55      +/-   ##
==========================================
+ Coverage   53.72%   54.97%   +1.25%     
==========================================
  Files          70       70              
  Lines        6362     6437      +75     
==========================================
+ Hits         3418     3539     +121     
+ Misses       2670     2626      -44     
+ Partials      274      272       -2     
Flag Coverage Δ
unittests 54.97% <91.47%> (+1.25%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
consensus/modules.go 95.69% <ø> (+2.15%) :arrow_up:
crypto/ecdsa/ecdsa.go 60.52% <ø> (-1.46%) :arrow_down:
internal/cli/twins.go 6.91% <16.66%> (+0.15%) :arrow_up:
consensus/consensus.go 68.09% <50.00%> (+4.62%) :arrow_up:
twins/network.go 87.72% <95.52%> (+1.93%) :arrow_up:
consensus/events.go 100.00% <100.00%> (ø)
consensus/types.go 79.45% <100.00%> (+12.02%) :arrow_up:
synchronizer/synchronizer.go 85.82% <100.00%> (+2.64%) :arrow_up:
twins/generator.go 93.13% <100.00%> (+1.19%) :arrow_up:
twins/scenario.go 96.77% <100.00%> (+5.52%) :arrow_up:
... and 16 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 3e72795...bde5359. Read the comment docs.