paritytech / json-rpc-interface-spec

29 stars 3 forks source link

archive: Introduce archive_storageDiff for indexers #159

Closed lexnv closed 1 week ago

lexnv commented 2 months ago

This PR introduces a new unstable method to the archive class.

The method archive_unstable_storageDiff aims to help indexers in providing storage differences between two blocks.

By default, the method returns the difference between currentHash and its parent.

Furthermore, users have the ability to specify the keys of interest:

For each individual prefix, the users can specify if they need:

For symmetry with the chainHead functions:

Closes: https://github.com/paritytech/json-rpc-interface-spec/issues/108

cc @paritytech/subxt-team @tomaka @jsdw @josepot

jsdw commented 2 months ago

I wonder about the performance implications of eg not providing prefixes (or providing many or large ones), but think that the streaming makes sense as a way of at least mitigating the performance hit of this call on the node! As an API though, off the top of my head it looks solid and feels like it would cover the use cases I can think of!

jsdw commented 3 weeks ago

@lexnv just bumping this incase it fell off the radar! It'd be good to address the comments and get it through review so we can implement it, but I know you're prob busy so not urgent :)

lexnv commented 2 weeks ago

@tomaka Would love to get your thoughts on this, thanks 🙏

jsdw commented 2 weeks ago

So this looks good, but I'm still hesitant about not using a subscription here.

If a client asks for a storage diff that is particularly large (ie between two quite distant blocks) I imagine it could be expensive to compute (in terms of the size of response needed as well as time taken to build it). Would this open the node to DoS attacks (clients asking for storage diffs known to be huge)? Or, to avoid this, would some storage diffs simply lead to an error because they are too expensive to compute and so the node refuses?

With a subscription based approach we could do something like progressively streaming the diffs we see as we work from first block to last block. The server wouldn't have to accumulate so much in one go that way, and could also take advantage of backpressure to kill connections to clients that were too slow to receive data back or whatever.

@tomaka I think you preferred a non subscription based approach so perhaps you have an idea already on how to avoid these issues (or maybe don't think that there is any issue in the first place) without needing subscriptions?

lexnv commented 1 week ago

Merging this to focus a bit on the implementation, we'll re-evaluate the subscription-based approach after we extract some performance metrics from the new RPC methods, thanks everyone for the feedback and review 🙏

Please feel free to open any issues / PRs to further discuss this, thanks again!