Fix or remove `storage_keys` and `storage_pairs` RPC APIs

koute commented 1 year ago

The following two RPC endpoints are inherently broken due to the fact that they return Vecs:

        /// Returns the keys with prefix, leave empty to get all the keys.
        #[method(name = "state_getKeys", blocking)]
        #[deprecated(since = "2.0.0", note = "Please use `getKeysPaged` with proper paging support")]
        fn storage_keys(&self, prefix: StorageKey, hash: Option<Hash>) -> RpcResult<Vec<StorageKey>>;

        /// Returns the keys with prefix, leave empty to get all the keys
        #[method(name = "state_getPairs", blocking)]
        fn storage_pairs(
                &self,
                prefix: StorageKey,
                hash: Option<Hash>,
        ) -> RpcResult<Vec<(StorageKey, StorageData)>>;

We should either delete them, or fix them.

Option 1: just delete them

This would require adding replacements which are paged and return the data in chunks; this was already done for storage_keys:

        /// Returns the keys with prefix with pagination support.
        /// Up to `count` keys will be returned.
        /// If `start_key` is passed, return next keys in storage in lexicographic order.
        #[method(name = "state_getKeysPaged", aliases = ["state_getKeysPagedAt"], blocking)]
        fn storage_keys_paged(
                &self,
                prefix: Option<StorageKey>,
                count: u32,
                start_key: Option<StorageKey>,
                hash: Option<Hash>,
        ) -> RpcResult<Vec<StorageKey>>;

The downside of this approach is that these are harder to use as the client needs to track the pages by themselves, and it's not backwards compatible.

Option 2: fix them

This would require modifying jsonrpsee and allow the payloads to be streamed.

Right now if an RPC endpoint returns a Vec in has to 1) create the whole Vec, 2) serialize the whole thing into JSON, 3) only then start streaming the JSON. When the number of elements is big this becomes very problematic.

What could be done here is to make it possible for the RPC handlers to return an iterator of values which would be dynamically iterated over and serialized into JSON as it is being read by the client.

The downside of this approach is extra complexity, but the upside is that it's fully backwards compatible (the current RPC endpoints and clients which use those endpoints still work) and arguably it can be simpler to use (when the client supports streaming too, or if they don't care if everything has to be preloaded into memory on their side before they can use it).

Option 3: fix them and have `_paged` variants

We could fix them to keep backwards compatibility and also have the paged variants, and let clients chose whichever they prefer.

cc @niklasad1

bkchr commented 1 year ago

What could be done here is to make it possible for the RPC handlers to return an iterator of values which would be dynamically iterated over and serialized into JSON as it is being read by the client.

If we can support this, it sounds like a good solution.

koute commented 1 year ago

If we can support this, it sounds like a good solution.

From a technical PoV I think we can; I have a semi-working private prototype jsonrpsee branch with a proof of concept implementation where I've added this.

There are just some tricky issues regarding how jsonrpsee defines those handlers, because right now the same RPC interface definition is used to generate code for both the client and the server (since jsonrpsee is a library which can both be used as an RPC client and an RPC server). If you return a Vec then that can be a Vec on both sides, but for streaming you need a type which on the server has to consume values and on the client it has to produce them, so you either need a type which can do both, or have different types for the client and the server. But that's solvable.

bkchr commented 1 year ago

I was more "afraid" of the JSONRPC spec (not sure something like this exists) doesn't support streaming.

koute commented 1 year ago

I was more "afraid" of the JSONRPC spec (not sure something like this exists) doesn't support streaming.

Well, let's say the endpoint returns a JSON array:

[1,2,3,4,5]

So you can either send it like this:

socket.send("[1,2,3,4,5]");

or you could send it like this:

socket.send("[");
socket.send("1");
socket.send(",2");
socket.send(",3");
socket.send(",4");
socket.send(",5");
socket.send("]");

The first one just sends it all in one go, and the second one streams it. Protocol-wise it's all the same as the streaming/chunking part can just be done on the socket layer (which already must be handled by the clients, since you can't have TCP packets of unlimited size). That's also why this streaming approach is backwards compatible, because neither the server nor the client needs to care whether the other side writes/reads the payload in one go or in a streaming manner. It's just an implementation detail that can be independently added to the server, to the client, or to both.

bkchr commented 1 year ago

Ahh yeah, on TCP level this makes sense :D

tomaka commented 1 year ago

Many if not most of the JSON-RPC functions are broken in various ways. This is not news. I would strongly suggest to focus on https://github.com/paritytech/json-rpc-interface-spec/ instead of fixing legacy broken things.

tomaka commented 1 year ago

To give more detail:

We had a long discussion in an Element channel a while ago. Many issues that the JSON-RPC API faces require breaking changes. Unfortunately, while modifying the Substrate source code is relatively easy, any breaking change to the existing Substrate JSON-RPC API is extremely risky and problematic, as we need to inform all the people who have ever built some script or application on top of this API, which is almost mission impossible.

Basically, if you fix these functions in the source code, you're breaking the entire world.

A better approach that we ended up going for is to design a new API with new function names, then ask people to switch, then deprecate the old ones.

paritytech / polkadot-sdk