Closed pugachAG closed 2 months ago
Chunks with large witness were generated using #11703 on forknet with 20 nodes.
The first experiment was to find the minimum compressed witness size that would result in network not being able to get chunk included in the block.
consensus.min_block_production_delay
(effectively determines block production time) was set to 1.3 seconds
The resulting witness size limit is 25MB.
Then we increased min_block_production_delay
to 3 seconds to test network recovery. Network successfully recovered after restart with the updated config. With increased block production time network was still able to make progress with witness size up to 50MB.
We are also interested to know what would be a ratio of missing chunks for the shard if we generate large witnesses for all heights. #11771 contains python script used to generate such stream of large witnesses. All graphs below represent number of missing witnesses per minute.
Only occasional missing chunks are observed:
Much worse than 7MB for one shard, most probably because nodes use 6x more network when distributing witnesses for all shards:
Substantial amount of missing chunks:
Pretty bad, but still 50%+ of chunks is included:
Shard makes some progress at ~25-30%:
It is also important for us to know how much witness size margin we have on top of the current mainnet traffic. With 3MB (which results in more than doubling witness size comparing to the baseline mainnet traffic) we have only occasional chunk misses:
neard undo-block
can be used to remove the head block from the chain. This can be used to recover the node when it somehow ended up with incorrectly applied chain head block. It means that this node won't be able to make any progress because of chunk extra mismatch for any block built on top.
forknet-20
was used to test undo block command:
mirror --host-filter mocknet-mainnet-118727510-smalltest-40eb stop-nodes
./binaries/neard1 --unsafe-fast-startup undo-block
.mirror --host-filter mocknet-mainnet-118727510-smalltest-40eb start-nodes
neard undo-block
logs:
ubuntu@mocknet-mainnet-118727510-smalltest-40eb:~/.near/neard-runner$ ./binaries/neard1 --unsafe-fast-startup undo-block
2024-07-29T10:47:10.525618Z INFO neard: version="2.0.0-rc.5" build="fbf9e49" latest_protocol=69
2024-07-29T10:47:10.546205Z INFO config: Validating Config, extracted from config.json...
2024-07-29T10:47:10.552812Z WARN genesis: Skipped genesis validation
2024-07-29T10:47:10.552841Z WARN genesis: Skipped genesis validation
2024-07-29T10:47:10.552854Z INFO config: All validations have passed!
2024-07-29T10:47:10.561251Z INFO db_opener: Opening NodeStorage path="/home/ubuntu/.near/data" cold_path="none"
2024-07-29T10:47:10.561376Z INFO db: Opened a new RocksDB instance. num_instances=1
2024-07-29T10:47:11.314456Z INFO db: Closed a RocksDB instance. num_instances=0
2024-07-29T10:47:11.314486Z INFO db_opener: The database exists. path=/home/ubuntu/.near/data
2024-07-29T10:47:11.314543Z INFO db: Opened a new RocksDB instance. num_instances=1
2024-07-29T10:47:13.775962Z INFO db: Closed a RocksDB instance. num_instances=0
2024-07-29T10:47:13.776009Z INFO db: Opened a new RocksDB instance. num_instances=1
2024-07-29T10:47:13.782417Z INFO db: Closed a RocksDB instance. num_instances=0
2024-07-29T10:47:13.782448Z INFO db: Opened a new RocksDB instance. num_instances=1
2024-07-29T10:47:13.865679Z INFO neard: Trying to update head prev_block_hash=5eDtyLhnmhD9ywKYFccpjq8AGH7DDxWRvuTQeTBMHNue current_head_hash=Ef3DqL2i6A4ztQ5Bz34kmu75UYb1XRrV6FwzytZLT7Mj prev_block_height=118921032 current_head_height=118921033
2024-07-29T10:47:13.898358Z INFO neard: The current chain store shows new_head_height=118921032 new_header_height=118921032
2024-07-29T10:47:13.906414Z INFO db: Closed a RocksDB instance. num_instances=0
Setup is similar to the one for forknet.
Slow chunk was reproduced in forknet using sleep
method in test contracts added in #11317. We wanted to make sure that #11344 works as expected in forknet with stateless validation.
The test was positive, resulting chain:
This is a top-level issue to track potential failure scenarios with Stateless Validation launch. This also includes Congestion Control since it is included in the same release.
Congestion Control
cc @wacban
Stateless Validation
Memtrie
State Sync
Chunk Endorsements