paritytech / substrate

Substrate: The platform for blockchain innovators
Apache License 2.0
8.39k stars 2.65k forks source link

Finalized should be before best #14308

Closed davxy closed 1 year ago

davxy commented 1 year ago

In current implementation of apply_finality_with_block_hash we explicitly check the finalized < best condition only if there is more than one leaf (in this case because finalized may be in a different branch than last best.

If this is the case then we explicitly set the newly finalized block as the best.

(as the comment says) This condition "weak check" is due to the assumption that the best is always after the finalized because consensus algorithms typically guarantees this.

But this is not always the case.

For example, in cumulus parachain code we are maintaining a cache of parablocks that (according to the relay-chain) are finalized. All these blocks are explicitly finalized via the Finalized::finalize_block without worrying about updating the best first here


This PR adds a defensive check that enforces the condition and thus handles correctly finalizations that are applied without updating best first.


This PR is necessary for THIS cumulus PR to work correctly

davxy commented 1 year ago

This seems like a tricky corner case; do we have a test which verifies this behavior? If not can we add one?

This is not a corner case as this always triggers in cumulus after this fix: https://github.com/paritytech/cumulus/pull/2696

⏩ Warping, Downloading state, 0.00 Mib (8 peers), best: #0 (0x4823…771a), finalized #0 (0x4823…771a), ⬇ 45.8kiB/s ⬆ 18.6kiB/s    
...
⏩ Warping, Importing state, 34.98 Mib (8 peers), best: #0 (0x4823…771a), finalized #0 (0x4823…771a), ⬇ 856.0kiB/s ⬆ 0.9kiB/s    
...
Warp sync is complete (34 MiB), restarting block sync.    
⏩ Block history, #0 (9 peers), best: #4643340 (0x00bc…a283), finalized #4643340 (0x00bc…a283), ⬇ 1.3kiB/s ⬆ 1.1kiB/s    
...
✨ Imported #4643341 (0x807f…b81a)    
Setting newly imported block as finalized. block_hash=0x807f959d9e7e937dc98b76b62371c08cffd0ed9081623e00409f3982834fb81a

✨ Imported #4643342 (0x42fa…b439)    
Setting newly imported block as finalized. block_hash=0x42fa1c0082564c75144d105e9537a56939490c082aa19ab695dce1f1ffacb439

✨ Imported #4643343 (0xbc51…5690)    
Setting newly imported block as finalized. block_hash=0xbc51b8bbc6f84b32c87b1d59811af7cc414899434a872987d1b4077131bf5690

✨ Imported #4643344 (0xce53…b640)    
Setting newly imported block as finalized. block_hash=0xce53f002349c7576a1f6e1a50b7833c2d8408d64396e415aab1fc935dca1b640

✨ Imported #4643345 (0x3406…5e48)    
Setting newly imported block as finalized. block_hash=0x3406f14393b567880f44d2822f4884d51c0f5fddc6425f4fd580276303025e48

Setting newly imported block as finalized. block_hash=0x1f07d89ecbe7e91ec90b397fc9512bcb8cddc91bfc5372493666220aab8fdba3
✨ Imported #4643346 (0x1f07…dba3)    

Setting newly imported block as finalized. block_hash=0x3f3ae3a3296ff286c232c3ab326a5cedda2ea90c2e71b6631b9439910059d2c6
✨ Imported #4643347 (0x3f3a…d2c6)    

✨ Imported #4643348 (0x9f3c…96e6)    
Setting newly imported block as finalized. block_hash=0x9f3cea31de4dcd5a2accaa9210be795c6f7a4c0ee96c102355e1ff4f95ac96e6

✨ Imported #4643349 (0x898c…892c)    
Setting newly imported block as finalized. block_hash=0x898c15dd84fcf92df14b3cab3d41953fd6188512ac27a3a5c266688de658892c

✨ Imported #4643350 (0xe731…99e0)    
Setting newly imported block as finalized. block_hash=0xe731d08b904e5e6aa7a65984ca8023ef99eabbb2ed2c9ec39d1efc8d7f0599e0
...
⚙️  Syncing  0.0 bps, target=#4643353 (3 peers), best: #4643340 (0x00bc…a283), finalized #4643350 (0xe731…99e0), ⬇ 756.0kiB/s ⬆ 2.0kiB/s    
...

Every imported block already marked as final by the relay chain is cached and set as final here without updating the best.

This thing can't trigger in babe and aura as they never call finalize_block in a way that breaks the invariant.

But this may happen in custom consensus algorithms that are not so strict about the invariant. Is better to be defensive because consensus is pluggable and out of backend control.

I'll try to add a test in cumulus that excercise this fix since it already contains consensus logic that triggers this.