Stateful -> stateless migration preparation

walnut-the-cat commented 2 months ago

Identify the right way to enable memtrie during the stateful-to-stateless validation protocol upgrade. Before the upgrade all nodes will track all shards and they will start tracking only one or some shards after the upgrade. Memtrie is expected to be enabled in the case of tracking single or some shards, as it requires more memory. However, we may also need to think about how to make the transition from (stateful + disk tries) to (stateless + memtries). During this, we may need to run nodes with memtries while tracking all shards. Assuming this is the path to follow, we need to follow-ups.

1) Profile the memory usage needed to do that (a similar task opened for RPC nodes tracking all nodes: https://github.com/near/nearcore/issues/11230)

2) Make sure after the protocol upgrade the memtries for the shards that are not tracked in the new epoch are unloaded, leaving the nodes only with memtries for their tracked shards.

Related thread here.

walnut-the-cat commented 1 month ago

Need to think about how and when in-memory trie will be enabled

robin-near commented 1 month ago

Here are the options for dealing with the memtrie launch during stateful -> stateless migration:

(All, then One) Enable memtries first, confirming that memtries would be loaded for all shards, and then after the protocol upgrade, one memtrie would remain in memory (for the assigned shard) while the other memtries would be unloaded. The downside for this approach is all stateless chunk producers need to have high memory instances temporarily before the upgrade.
(Assigned Shard Only) Enable memtries first, but modifying it so that even in the stateful case, it would load only the shard that the node is assigned to (or if not validator, all shards). This makes it consistent with the stateless case. However, this may be difficult to implement. Suppose in the stateful case we have epochs E and E + 1, where in E we are assigned shard 1 and in E + 1 we're assigned shard 2. While in epoch E, we need to find some time to load shard 2 into memory for preparation of the next epoch, but since we're still tracking shard 2, the trie for shard 2 keeps changing. So we have two options:
- (Concurrent Load) Modify the memtrie loading code so that it supports loading a "hot shard", i.e. during concurrent changes to the shard; this may or may not be easy but at least involves freezing the flat storage like when we take snapshots.
- (Forced State Sync) Even though we track shard 2, we still force shard 2 to go through state sync, i.e. stop tracking the shard at the beginning of epoch E, pretend to do a state sync (but we already have the data so it's a noop), and then enter catchup. During catchup we will automatically first load the memtrie, and then will catch up until it's up to date, and then we would be tracking the shard again and everything continues normally. The problem here is that while we load for memtrie to load, this validator is not validating the shard, and that may become a security issue (though with 6 shards, is that really a problem?)
(None, then One) Enable memtries first, but modify the logic to only load memtrie for the stateless protocol. The challenge here is similar to Option 2 but only exists for the very last epoch before the protocol upgrade, because in preparation for the first stateless epoch we need to load memtrie for one shard.
(None, then None, then One) Enable memtries first, but modify the logic to only load memtrie since the second epoch after the protocol upgrade. This makes implementation easy, but the first epoch after the protocol upgrade will not have memtries and so would have degraded performance.
(None, then Manual) Do not enable memtries before the protocol upgrade; ask node operators to enable memtries after the protocol upgrade has passed. This is like Option 4 except chunk producers will be degraded until they take action themselves.

@bowenwang1996 I heard you're in favor of Option 1, could you confirm that is still the best option, after considering the other options available?

Regardless of the option picked, we would need to repeatedly test this protocol upgrade.

bowenwang1996 commented 1 month ago

@robin-near yes I still think option 1 is the best. 2 and 3 are too complex and add not just additional engineering complexity but also testing burden. 4 and 5 would likely degrade performance and we cannot control whether there will be high load on mainnet when it happens, so it is the best to avoid performance degradation altogether.

tayfunelmas commented 4 weeks ago

Based on today's discussion, unloading memtries will not be necessary since the validators will need to restart the nodes to downgrade the RAM size.

staffik commented 1 week ago

Unloading memtrie is done: https://github.com/near/nearcore/pull/11657.

staffik commented 1 week ago

Testing the migration: https://near.zulipchat.com/#narrow/stream/308695-nearone.2Fprivate/topic/Forknet.2020-node.20statefull.20to.20stateless.20test.

staffik commented 6 days ago

It seems we have "shadow tracking" already implemented: https://github.com/near/nearcore/blob/master/core/chain-configs/src/client_config.rs#L412 I will test how it works with memtries and validator key hot swap. cc @tayfunelmas @wacban

wacban commented 6 days ago

I think this may track the shard where the account id is located, not the shard that the validator with this account id would track. Either way this is a good find and definitely related, perhaps we can reuse some of it for our purpose?

Also the name is way less catchy than shadow tracking ;)

staffik commented 2 days ago

2024-07-01 (Monday) Update

We did the migration twice, including RPC and split-storage archival nodes, and everything looks ok.
Validator key hot swap + shadow tracking works fine, near-zero missed chunks.
Described results in a doc.
What's left is we gonna see how it works with reduced network bandwidth.

near / near-one-project-tracking

Stateful -> stateless migration preparation #65