Closed tmpolaczyk closed 5 days ago
@@ Coverage Diff @@
## master tomasz-refactor-db-removal-restart +/- ##
======================================================================
+ Coverage 67.10% 67.25% +0.15%
+ Files 253 255 +2
+ Lines 44082 44273 +191
======================================================================
+ Hits 29577 29772 +195
- Misses 14505 14501 -4
Files Changed | Coverage | |
---|---|---|
/node/src/container_chain_spawner.rs | 44.89% (-0.45%) | 🔽 |
/runtime/dancebox/tests/common/xcm/core_buyer.rs | 100.00% (+9.16%) | 🔼 |
Coverage generated Mon Jul 15 11:40:07 UTC 2024
Several improvements to the spawn function based on what we see in testnet, and some refactors to make the code cleaner. Enabling "hide whitespace" in files view makes this easier to review.
Change select_sync_mode to return warp sync for an existing database. Even if we return warp sync it will still use full sync.undid this change, see [1]try_spawn
logic to a function instead of using a closure.ContainerChainSpawnParams
to make cloning and passing totry_state
easier.spawn
to be a regularasync fn
instead of an fn that returns a boxed futureFix #486
Edit:
[1]: because warp sync has some bugs and can get stuck, we decided to default to full sync if a database exists. Combined with the change of "Do not remove db if the block number is 0", this means that now by default if warp sync fails to sync in 1 session, or if the node is stopped manually while the warp sync is in progress, when the chain restarts it will use full sync. By "warp sync" I mean the state sync part of warp sync, if the node is stopped while the block history download is in progress, nothing breaks.
Since using full sync can be very bad for chains with a big state, I added an error level log so that collators are aware that they may not be able to sync in time. They can retry a warp sync by stopping the node and manually removing the database, but if warp sync got stuck the first time it will probably also get stuck the second time. It's not clear why warp sync gets stuck but it seems related to bootnodes banning collators with error "Same state request multiple times". More investigation is needed.