RPC to subscribe storage prefix

xlc commented 1 year ago

Previous issue https://github.com/paritytech/substrate/issues/5790

I would like to subscribe by key prefix. This will allow me to subscribe new entries in map / double map.

lexnv commented 1 year ago

The chainHead_storage method with query type closestDescendantMerkleValue returns an opaque hash.

Comparing two results of the same method call is used to determine if any storage changes happened below the provided prefix. With this approach, users are responsible of making the storage call, as opposed to the legacy rpc that provides notifications back to the user.

Would you also be interested in the RPC providing the key that was changed / added?

xlc commented 1 year ago

Yeah I still need a way to figure out the added/removed/modified keys.

lexnv commented 1 year ago

Implementing this with the current API might be complicated, although not impossible.

I would start by constructing a closestDescendantMerkleValue query type. Then, construct a decendantHashes query type to obtain all the keys under the provided prefix.

Because the current API does not offer support for your use-case at the moment, you'd need to make another closestDescendantMerkleValue to ensure that keys where not added in the meanwhile. If the second closestDescendantMerkleValue is different then the first one, you'd need to repeat this process again.

Whenever a different hash is reported by closestDescendantMerkleValue, another decendantHashes query must follow to compare and detect any changed keys.

Even though the API supports batch requests via the items parameter, the order in which the RPC server handles requests is not imposed by the spec. This leads to at least 3 RPC calls for the initialization routine, then at least one periodic call to verify the merkle-value.

Considering the fact that decendantHashes queries have support for pagination, you'd also need to drive the responses with the chainHead_continue method.

This sounds indeed complicated enough and might not be feasible for smaller prefix keys, or prefix keys that have multiple storage entries below them.

Considering the light-client, I don't have enough context, but I would expect this to be even difficult to implement in the server.

I would be interested to hear more about your use-case, since it sounds like a complicated thing to implement 👍 How are you handling this use-case at the moment? Using the legacy APIs?

// for reference, @tomaka might have more insights into this

xlc commented 1 year ago

Here are some of my use cases.

1) Subscribe all the storage changes for new blocks. This is useful for indexer. The legacy trace RPC is currently used for this.

2) Monitor a storage map and perform various checks/tasks. e.g. assert the total issuance is equal to the sum of all the balances. We are currently use events to trigger and re-iterate the whole map for checking. This is super inefficient.

3) Subscribe for a storage map and update UI when a new item is added. Right now the dApp just can't handle it and require users to do a refresh to see the new item.

josepot commented 10 months ago

Here are some of my use cases.

Subscribe all the storage changes for new blocks. This is useful for indexer. The legacy trace RPC is currently used for this.

Monitor a storage map and perform various checks/tasks. e.g. assert the total issuance is equal to the sum of all the balances. We are currently use events to trigger and re-iterate the whole map for checking. This is super inefficient.

Subscribe for a storage map and update UI when a new item is added. Right now the dApp just can't handle it and require users to do a refresh to see the new item.

FWIW I'm working on integrating a new API into the @polkadot-api/client. This API is designed to monitor changes in a storage map. The goal is to make it easy to automatically update the UI whenever there're additions, deletions, and/or modifications.

However, this task is turning out to be more complex than I first thought.

One important thing to note is that the updates provided by this new API won't necessarily cover every single change as it happens on every finalized block. Ideally, it would inform the user of changes after each finalized block. But, there's no guarantee of this happening every time. This means sometimes the updates might include changes from some previous blocks all at once, rather than just the most recent one. So, if a user gets an update at block 'x', it might include all the changes that happened since block 'x-y'. This limitation makes the feature less suitable for those who need to track every single change in real-time for each finalized block.

Additionally, there's another aspect to consider: the more storage map "watchers" a user sets up, the more they might experience delays in getting updates.

I've thought about introducing an overload in the API that would force the library to evaluate the deltas in every single finalized block. However, this approach has a significant downside. It greatly increases the chance of triggering a 'stop' event.

In conclusion, I find myself agreeing with @xlc's viewpoint: it would be really beneficial to have an API specifically designed to identify deltas for a given "partial trie path". This would be particularly useful even if it could only provide the changes a given pinned block.

Otherwise, I would greatly appreciate some advice on the best approach to querying data changes. In the background, my current process involves the following steps:

Detecting Changes: First, I use closestDescendantMerkleValue to check if there have been any changes on the storage map.
Identifying Specific Changes: Next, I use descendantsHashes to figure out which specific entries have been added, deleted, or updated. After identifying these changes, I need to actually retrieve the updated values.

Here's where I face a dilemma: What's the most efficient way to retrieve these values? Should I use descendantValues to get them all at once, or should I make a query with many values in the storage request?

The challenge is that each value query uses one of the limited 'operation slots' available. On one hand, if the storage map contains many large items, using descendantValues to fetch everything at once seems excessive and could be a waste of bandwidth. On the other hand, each individual value query also consume these operation slots.

Currently, I’m using descendantValues only to pull the initial list of values and a list of value items to query the changes using just one request (which takes several operation slots). But I think there's room for optimization here. Perhaps in the future, I could develop a heuristic to decide when it’s more efficient to use descendantValues.

Like @xlc mentioned, there really should be a simpler and more efficient method to handle this process.

cc: @tomaka 🙏

paritytech / json-rpc-interface-spec

RPC to subscribe storage prefix #102