Open walnut-the-cat opened 2 months ago
Need to think about how and when in-memory trie will be enabled
Here are the options for dealing with the memtrie launch during stateful -> stateless migration:
@bowenwang1996 I heard you're in favor of Option 1, could you confirm that is still the best option, after considering the other options available?
Regardless of the option picked, we would need to repeatedly test this protocol upgrade.
@robin-near yes I still think option 1 is the best. 2 and 3 are too complex and add not just additional engineering complexity but also testing burden. 4 and 5 would likely degrade performance and we cannot control whether there will be high load on mainnet when it happens, so it is the best to avoid performance degradation altogether.
Based on today's discussion, unloading memtries will not be necessary since the validators will need to restart the nodes to downgrade the RAM size.
Unloading memtrie is done: https://github.com/near/nearcore/pull/11657.
It seems we have "shadow tracking" already implemented: https://github.com/near/nearcore/blob/master/core/chain-configs/src/client_config.rs#L412 I will test how it works with memtries and validator key hot swap. cc @tayfunelmas @wacban
I think this may track the shard where the account id is located, not the shard that the validator with this account id would track. Either way this is a good find and definitely related, perhaps we can reuse some of it for our purpose?
Also the name is way less catchy than shadow tracking ;)
2024-07-01 (Monday) Update
Identify the right way to enable memtrie during the stateful-to-stateless validation protocol upgrade. Before the upgrade all nodes will track all shards and they will start tracking only one or some shards after the upgrade. Memtrie is expected to be enabled in the case of tracking single or some shards, as it requires more memory. However, we may also need to think about how to make the transition from (stateful + disk tries) to (stateless + memtries). During this, we may need to run nodes with memtries while tracking all shards. Assuming this is the path to follow, we need to follow-ups.
1) Profile the memory usage needed to do that (a similar task opened for RPC nodes tracking all nodes: https://github.com/near/nearcore/issues/11230)
2) Make sure after the protocol upgrade the memtries for the shards that are not tracked in the new epoch are unloaded, leaving the nodes only with memtries for their tracked shards.
Related thread here.