paritytech / substrate

Substrate: The platform for blockchain innovators
Apache License 2.0
8.39k stars 2.65k forks source link

Document the difference between sp_io::storage and sp_io::default_child_storage #7574

Open vporton opened 3 years ago

vporton commented 3 years ago

The answers to the following questions should be immediately obvious (but now they are not) after reading https://docs.rs/sp-io/2.0.0/sp_io/storage/index.html and/or https://docs.rs/sp-io/2.0.0/sp_io/default_child_storage/index.html:

What is the difference between sp_io::storage and sp_io::default_child_storage? How each of them is used? Which of the two is used to store contract storage variables? What is stored in the other one? What is the "default child trie" and why at all there is a necessity to have a default child trie? How are these two storages related to each other?

Please also answer the questions commenting to this issue.

vporton commented 3 years ago

Also: Why does default_child_storage API has storage_key: &[u8] argument and storage doesn't? (It is counterintuitive, it would be rather the reverse: default_child_storage to be attached to a determined storage_key, so it would be not to be passed; and storage having multiple sub-storages, so may need storage_key argument.)

vporton commented 3 years ago

It even should be documented what is the path of default_child_storage tries inside storage tries.

bkchr commented 3 years ago

CC @cheme

cheme commented 3 years ago

Quick first answer:

sp_io::storage only stores state in a single trie. Content can be organized by prefix: the api do not enforce key to be hashes, the chain designer can unbalance the trie a bit (eg use specific prefixes for each modules). The root of this trie state is directly in the block header.

sp_io::default_child_storage: 'Child storage' api can addresses multiple trie states and stores the root of each trie in a parent root.

'Default' kind (only this kind of child state exists) uses the same kind of trie state as sp_io::storage (see https://github.com/w3f/polkadot-spec/releases , there may also be a part on child trie there).

It always store its root in a single parent trie. This parent trie is always the one used in sp_io::storage api. Also to avoid overwriting a root by mistake, there is a dedicated prefix in the parent trie for child root storage (ensure the state machine do not write into this prefix): https://github.com/paritytech/substrate/blob/368903f7aa9ef652bf8157d476dc13cb36e3affc/primitives/storage/src/lib.rs#L173

sp_io::storage is used by most frame pallet by using 'decl_storage' macro, and soon the new procedural storage macro.

sp_io::default_child_storage is call almost directly, there is just a very small api in front https://github.com/paritytech/substrate/blob/master/frame/support/src/storage/child.rs , but almost the same as directly call the sp_io api (especially since we got a single kind of child state/trie).

contract pallet is one of the only frame module using it: the contract variable are in a child storage (one trie per contract account), and the accounts are in the parent trie.

Almost all the chain state is using sp_io::storage with prefixes. I only know contract and crowdfund pallets to use child storage (but did not check new pallets recently).

At the time 'default' is not necessary, because we have a single kind of child trie. I wanted to put different format of trie or different states, but my PR going in this direction where a bit impacting and I did not reach a consensus, so at the time I am focusing on different problematic, but will probably go back to it at some point.

They're actually the same, only the location of their roots changes (in header or in parent trie). I remember rewriting the api to only have child trie (using optional parent trie for the top one) in order to make it more natural to have multiple level of trie, but using two different api was less impacting (I think child trie did not exist in the first versions of substrate).

storage_key is the location of the state root in the parent key, while storage stores root in the header.

But the api should use only 'storage_key' when it targets the 'default' child trie (rpc does not, iirc it uses the concatenation as parameter).

vporton commented 3 years ago
  • Why does default_child_storage API has storage_key: &[u8] argument and storage doesn't?

storage_key is the location of the state root in the parent key, while storage stores root in the header.

I still don't understand why default_child_storage does use it and storage doesn't. Why not for example vice versa?

cheme commented 3 years ago

for the first a query is done: header of current block hash -> get state root from header -> fetch value at 'child_prefix' ++ 'storage_key' in state trie with this header root -> decode this value to hash -> fetch value at 'key' in state trie with this hash as root

So you need as input 'storage_key' and 'key' to get your value (header of block is related to context or a block hash for rpc).

for the second one: header of current block hash -> get state root from header -> fetch value at 'key' in state trie with this header root

so you need only key.

Why not for example vice versa?

it is a two level only structure so if you query first level you need one key, and two key for second level.

For a more hierarchical child trie (n level of trie), the api should be unique, either passing a array of key or using a separtor in key like in uri. I would like it better (as stated in previous message), but the current api is two level only. Probably to avoid changing the existing api (storage) there is another one (default_child_storage). Note that it should be possible to run a n level hierachical trie with the current host api, just use child tries only and write the root of every child trie wherever you need them at the end of the block processing (using child_trie_storage root calls when running 'on_finalize' system frame hook). That way you only need to work at frame level.

vporton commented 3 years ago

for the first a query is done: header of current block hash -> get state root from header -> fetch value at 'child_prefix' ++ 'storage_key' in state trie with this header root -> decode this value to hash -> fetch value at 'key' in state trie with this hash as root

"For the first" - you mean for default_child_storage?

So get value at data(child_prefix ++ storage_key) ++ key (data(child_prefix ++ storage_key) is the hash mentioned by you). Why not just child_prefix ++ storage_key ++ key? It would be simpler. Is it to improve performance not to querying child_prefix ++ storage_key prefix every time?

cheme commented 3 years ago

"For the first" - you mean for default_child_storage?

:+1:

Why not just child_prefix ++ storage_key ++ key? It would be simpler.

Yeah an unbalanced trie would work as well. With current trie implementation we would need some additional api to extract root for a given prefix (there is a small subtletie related to the fact that the trie node might contain a prefix (the fact that substrate trie do not use 'extension node' makes it a bit more awkward, and I would probably want to attach the size of the prefix with the contract root).

I did not find the discussion where the choice was made (somewhere on github), one of the reason for chosing child trie was the lesser cost of deleting the whole child trie (can be true also with prefix but the api is not really good for it, I remember drafting some code to attach/detach branch of a trie: it is doable).

Also note that it is rather similar to how ethereum account storages work.

When the choice was made I wanted https://github.com/paritytech/substrate/pull/5280 so the rent would work well, I still think it is the right way to go, but it did not get merge. In theory it is super simple: isolate the child trie node in db collection storage so delete the whole child trie is straight forward. So I realize today that there is https://github.com/paritytech/substrate/issues/6594 and https://github.com/paritytech/substrate/issues/7526 open as a consequence, but I am not sure what the propose solution is.

Is it to improve performance not to querying child_prefix ++ storage_key prefix every time?

Yes in theory during processing you only need to access a child trie root once, but in practice, caching makes the prefix approach also fine.

TLDR; the fact that you need to deal with root of the contract storage makes it rather natural: each contract got its own state trie. Nothing impossible with a prefix in an unbalance trie, but not as direct. In general a child trie of the same kind as its parent trie is just one more node (in fact if I wanted to implement the same, I would probably also create this node in the trie too to avoid having a prefix in the root of the contract storage). Also I really like the idea to allow different state implementation as child trie, but today it is only 'default'.