storage: Clarify non-queriable storage keys

lexnv commented 11 months ago

This PR clarifies the behavior of non-querieable storage keys.

A non-queriable storage key starts with :child_storage: or :child_storage:default. The key could be either one of the keys passed to the items array of queries, or the childTrie parameter.

The current behavior of substrate is no ignore those queries. However, this PR introduces a JSON-RPC error to users that the API has been wrongfully called.

Raised from review: https://github.com/paritytech/polkadot-sdk/pull/1846#discussion_r1360512158

// cc @skunert @jsdw @tomaka @josepot

tomaka commented 11 months ago

In the past I've tried to go for the approach where the child trie "references" found in the main trie are not queriable, for simplicity. However, I now think that it makes more sense to actually make them queriable, for consistency with everything else in the rest of the design.

I don't think that there should be such a thing as a non-queriable key.

or the childTrie parameter

While for keys (of specifically the main trie) it's debatable, there's nothing wrong with having a child trie that starts with :child_storage:. Why would it be non-queriable?

tomaka commented 11 months ago

To give a brief explanation of how child tries work:

The runtime has access not only to a "main trie" where it can read and write stuff, but also to an unlimited number of "child tries". When the runtime wants to read from or write to the storage, it chooses whether it is in the main trie or in a specific child trie. If it is a specific child trie, the child trie has an identifier, which is opaque bytes (usually it's a hash).

The tricky thing is that, because we want the content of the child tries to be part of the root hash of everything, whenever the runtime writes something to a child trie, the trie root hash of this child trie is magically written into the main trie, at the location concat(":child_trie:default:", child_trie_id). (note: it's a bit more complicated than that because the write to the main trie done at the end of the runtime call or under certain conditions, but this really doesn't matter here)

In order to avoid accidents, it has been made forbidden for the runtime to write directly to any storage key of the main trie that starts with :child_trie:. Why :child_trie: and not :child_trie:default:? Because originally there were supposed to several different types of child tries, but this idea has now been abandoned and in the end only so-called "default" child tries exist.

So there are two ways to reason about all of this:

Either consider that the storage is made of a main trie and of child tries, and that the fact that the child trie root hash is stored in the main trie is just a hack. In that case, it makes sense in general to deny reading from any key of the main trie that starts with :child_trie:.
Or consider that when the runtime writes something to a child trie, it also automatically writes to the main trie, and that the fact that the runtime is forbidden from writing to any key that starts with :child_trie: is the hack.

I think that the less surprising solution is actually the second.

We have people executing a runtime call and using the JSON-RPC API in order to query the storage whenever the runtime tries to read from the storage, and going for the first solution would be very surprising to them. On the other hand, I can't find any situation where the second solution is problematic.

lexnv commented 11 months ago

Thanks Pierre! That makes a lot of sense! 🙏

I was a bit scared of the substrate API adding a prefix of :child_storage:default to any childTrie key.

We receive the key as hex from users in the chainHead_storage. Then we convert this key to a ChildInfo object. Eventually, we'll be interested in fetching this from storage in the trie_backend, using the child_info.prefixed_storage_key() method. This method adds a prefix the b":child_storage:default:" to the received key.

That means that when users provide a childTrie = hex(":child_storage:default:Key0"); then substrate would try to look for :child_storage:default:child_storage:default:Key0 as root hash of the child trie (cc @skunert, since I might have understood something wrong here).

If I got this right, then the path forward here would be:

allow any key from query items to be read (including those starting with ":child_storage:")
for childTrie param, we could document that child root hash is stored at concat(":child_storage:default", childTrie)

And that would mean this PR could be closed and the concept of non-queriable storage keys removed from substrate's chainHead and archive implementation. Therefore, allowing childTrie keys of ":child_storage:", which would have a child root hash stored at ":child_storage:default::child_storage".

Let me know if I've expressed this right 🙏

tomaka commented 11 months ago

We receive the key as hex from users in the chainHead_storage. Then we convert this key to a ChildInfo object. Eventually, we'll be interested in fetching this from storage in the trie_backend, using the child_info.prefixed_storage_key() method. This method adds a prefix the b":child_storage:default:" to the received key.

That does indeed look a bit sketchy, but I have no idea how this part of Substrate works.

for childTrie param, we could document that child root hash is stored at concat(":child_storage:default", childTrie)

I don't actually think that we need to document that, as it's part of the "core Polkadot protocol", for the same reason that we don't document at which keys the runtime stores stuff in general. After all, if for example new kinds of child tries were added in the future (like was planned), then we would need to update the JSON-RPC specification, which doesn't really seem appropriate.

lexnv commented 11 months ago

Closing this, thanks again for all the info! 🙏

paritytech / json-rpc-interface-spec

storage: Clarify non-queriable storage keys #118