near / nearcore

Reference client for NEAR Protocol
https://near.org
GNU General Public License v3.0
2.33k stars 624 forks source link

Idea: Storage sharding #8713

Open nagisa opened 1 year ago

nagisa commented 1 year ago

Today we have a single big database for storage, which is probably fine, given that with sharding phase 1 all nodes need to track all shards all the time. This is, however non-ideal, for multiple reasons.

  1. When nodes no longer track all of the shards, they would still download the state for all other shards they don't care about (at least when restoring from an S3 backup.) This means more bandwidth use for no good purpose.
  2. We have less isolation in contract execution than we'd strictly like. If there is a vulnerability of any sort that allows a contract to read an arbitrary storage key(s), the fact that the storage stores information about all the contracts at the same time would make it possible for them to obtain or – worse – modify the other contract's state.

We should look into splitting up the storage into smaller pieces that can be downloaded independently.

If we split on a shard boundary, we can reduce the amount of unnecessary data being transferred, which would make it much easier to set up nodes, and somewhat cheaper to host the state snapshots.

If we split on an account boundary, we could now decouple accounts from the shards more easily. Not to mention, in the runtime we would be able to make available just the databases that store information for the account in question, and none of the others, providing significant isolation benefits.

However we split the state, we are also potentially looking at some I/O access performance improvements. Smaller state means less time spent on searching for stuff. This may also help a little with state sync (#8545)?

cc #8712

akhi3030 commented 1 year ago

CC: @mm-near, @Longarithm, @walnut-the-cat