paritytech / jsonrpsee

Rust JSON-RPC library on top of async/await
MIT License
645 stars 172 forks source link

Spamming requests hits timeout after upgrading v0.16 to v0.17/v0.20 #1259

Closed haerdib closed 11 months ago

haerdib commented 11 months ago

I'm currently trying to upgrade our jsonrpsee client from v0.16 to v0.20. However, I'm hitting a problem I could not solve yet.. maybe you have an idea what's the problem here.

To test our wrapper client around jsonrpsee, we spam multiple Substrate node request in a synchronous manner. E.g. the second request is only sent after the first request was fired. There is no subscription involved.

With v0.16 that works without a problem : 60 and above requests are sent rapidly within less than a second. v0.16 logs:

[2023-12-07T07:30:30Z TRACE soketto::connection] e0bd8843: send: (Text (fin 1) (rsv 000) (mask (1 b7699f16)) (len 363))
[2023-12-07T07:30:30Z TRACE soketto::connection] e0bd8843: Sender flushing connection
[2023-12-07T07:30:30Z TRACE soketto] read 2 bytes
[2023-12-07T07:30:30Z TRACE soketto::connection] e0bd8843: recv: (Text (fin 1) (rsv 000) (mask (0 0)) (len 103))
Sending extrinsic with nonce 59
[2023-12-07T07:30:30Z DEBUG substrate_api_client::api::rpc_api::author] sending extrinsic: RpcParams(ParamsBuilder { bytes: [91, 34, 48, 120, 51, 57, 48, 50, 56, 52, 48, 48, 100, 52, 51, 53, 57, 51, 99, 55, 49, 53, 102, 100, 100, 51, 49, 99, 54, 49, 49, 52, 49, 97, 98, 100, 48, 52, 97, 57, 57, 102, 100, 54, 56, 50, 50, 99, 56, 53, 53, 56, 56, 53, 52, 99, 99, 100, 101, 51, 57, 97, 53, 54, 56, 52, 101, 55, 97, 53, 54, 100, 97, 50, 55, 100, 48, 49, 97, 54, 57, 51, 102, 102, 55, 97, 100, 52, 102, 54, 52, 48, 97, 53, 53, 51, 97, 55, 98, 99, 49, 52, 51, 57, 57, 97, 50, 48, 50, 98, 55, 50, 102, 49, 54, 98, 53, 52, 99, 48, 49, 101, 52, 50, 53, 48, 51, 50, 54, 57, 49, 49, 101, 102, 52, 102, 101, 102, 55, 57, 48, 54, 57, 48, 56, 97, 101, 49, 51, 50, 54, 102, 97, 53, 98, 55, 99, 55, 98, 100, 52, 48, 49, 99, 49, 52, 51, 48, 100, 50, 101, 55, 102, 101, 101, 55, 98, 52, 57, 50, 50, 50, 98, 57, 98, 57, 49, 50, 50, 101, 97, 55, 97, 97, 100, 102, 97, 102, 100, 55, 100, 52, 100, 53, 56, 97, 48, 48, 101, 99, 48, 48, 48, 48, 48, 54, 48, 48, 48, 48, 56, 101, 97, 102, 48, 52, 49, 53, 49, 54, 56, 55, 55, 51, 54, 51, 50, 54, 99, 57, 102, 101, 97, 49, 55, 101, 50, 53, 102, 99, 53, 50, 56, 55, 54, 49, 51, 54, 57, 51, 99, 57, 49, 50, 57, 48, 57, 99, 98, 50, 50, 54, 97, 97, 52, 55, 57, 52, 102, 50, 54, 97, 52, 56, 48, 50, 48, 57, 51, 100, 48, 48, 34, 44], start: '[', end: ']' })
[2023-12-07T07:30:30Z TRACE soketto::connection] e0bd8843: send: (Text (fin 1) (rsv 000) (mask (1 9aa38d34)) (len 363))
[2023-12-07T07:30:30Z TRACE soketto::connection] e0bd8843: Sender flushing connection
[2023-12-07T07:30:30Z TRACE soketto] read 2 bytes
[2023-12-07T07:30:30Z TRACE soketto::connection] e0bd8843: recv: (Text (fin 1) (rsv 000) (mask (0 0)) (len 103))
[2023-12-07T07:30:30Z TRACE soketto::connection] e0bd8843: closing connection
[2023-12-07T07:30:30Z TRACE soketto::connection] e0bd8843: send: (Close (fin 1) (rsv 000) (mask (1 7664355e)) (len 2))
[2023-12-07T07:30:30Z TRACE soketto::connection] e0bd8843: Sender flushing connection

However, after upgrading (both v0.17 and v0.20 have the same problem); The 58th message gets a time-out after one minute:

2023-12-07T07:35:47Z TRACE soketto::connection] 7f3a5572: Sender flushing connection
[2023-12-07T07:35:47Z TRACE soketto] read 2 bytes
[2023-12-07T07:35:47Z TRACE soketto::connection] 7f3a5572: recv: (Text (fin 1) (rsv 000) (mask (0 0)) (len 103))
Sending extrinsic with nonce 58
[2023-12-07T07:35:47Z DEBUG substrate_api_client::api::rpc_api::author] sending extrinsic: RpcParams(ParamsBuilder { bytes: [91, 34, 48, 120, 51, 57, 48, 50, 56, 52, 48, 48, 100, 52, 51, 53, 57, 51, 99, 55, 49, 53, 102, 100, 100, 51, 49, 99, 54, 49, 49, 52, 49, 97, 98, 100, 48, 52, 97, 57, 57, 102, 100, 54, 56, 50, 50, 99, 56, 53, 53, 56, 56, 53, 52, 99, 99, 100, 101, 51, 57, 97, 53, 54, 56, 52, 101, 55, 97, 53, 54, 100, 97, 50, 55, 100, 48, 49, 54, 50, 55, 101, 50, 102, 53, 99, 98, 56, 52, 51, 57, 101, 101, 99, 98, 53, 101, 97, 49, 97, 98, 48, 48, 57, 98, 99, 56, 56, 97, 101, 97, 51, 97, 50, 57, 49, 97, 51, 102, 53, 51, 52, 102, 100, 51, 54, 97, 97, 50, 49, 102, 57, 56, 55, 102, 52, 57, 54, 57, 54, 54, 99, 102, 56, 50, 50, 102, 101, 49, 98, 99, 53, 98, 97, 53, 102, 54, 51, 101, 97, 102, 50, 53, 52, 50, 99, 53, 97, 57, 97, 56, 48, 52, 100, 50, 53, 54, 50, 101, 100, 51, 54, 99, 100, 50, 56, 57, 102, 52, 48, 49, 102, 48, 56, 102, 55, 54, 54, 53, 51, 52, 48, 50, 49, 56, 48, 48, 48, 101, 56, 48, 48, 48, 48, 48, 54, 48, 48, 48, 48, 56, 101, 97, 102, 48, 52, 49, 53, 49, 54, 56, 55, 55, 51, 54, 51, 50, 54, 99, 57, 102, 101, 97, 49, 55, 101, 50, 53, 102, 99, 53, 50, 56, 55, 54, 49, 51, 54, 57, 51, 99, 57, 49, 50, 57, 48, 57, 99, 98, 50, 50, 54, 97, 97, 52, 55, 57, 52, 102, 50, 54, 97, 52, 56, 48, 50, 48, 57, 51, 100, 48, 48, 34, 44], start: '[', end: ']' })
[2023-12-07T07:35:47Z TRACE soketto::connection] 7f3a5572: send: (Text (fin 1) (rsv 000) (mask (1 491a35ef)) (len 363))
[2023-12-07T07:35:47Z TRACE soketto::connection] 7f3a5572: Sender flushing connection
[2023-12-07T07:35:47Z TRACE soketto] read 2 bytes
[2023-12-07T07:35:47Z TRACE soketto::connection] 7f3a5572: recv: (Text (fin 1) (rsv 000) (mask (0 0)) (len 103))
[2023-12-07T07:36:16Z TRACE soketto] read 2 bytes
[2023-12-07T07:36:16Z TRACE soketto::connection] 7f3a5572: recv: (Ping (fin 1) (rsv 000) (mask (0 0)) (len 0))
[2023-12-07T07:36:16Z TRACE soketto::connection] 7f3a5572: send: (Pong (fin 1) (rsv 000) (mask (1 2d751c1d)) (len 0))
[2023-12-07T07:36:16Z TRACE soketto::connection] 7f3a5572: Receiver flushing connection
[2023-12-07T07:36:46Z TRACE soketto] read 2 bytes
[2023-12-07T07:36:46Z TRACE soketto::connection] 7f3a5572: recv: (Ping (fin 1) (rsv 000) (mask (0 0)) (len 0))
[2023-12-07T07:36:46Z TRACE soketto::connection] 7f3a5572: send: (Pong (fin 1) (rsv 000) (mask (1 f909b983)) (len 0))
[2023-12-07T07:36:46Z TRACE soketto::connection] 7f3a5572: Receiver flushing connection
thread 'main' panicked at examples/examples/benchmark_bulk_xt.rs:67:49:
called `Result::unwrap()` on an `Err` value: RpcClient(Client(RequestTimeout))
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

There was nothing changed on node (server) side, and nothing in the client code as well, apart from the necessary renaming. The logs do not show anything obvious. I did not find anything related to this problem in the changelogs either. Am I missing something?

niklasad1 commented 11 months ago

Hmm, thanks for raising the issue.

Are you sure that v0.17 suffers from this issue? I suspect that https://github.com/paritytech/jsonrpsee/pull/1145 could be a regression but it was introduced first in v0.20.

Can you please add instructions how to reproduce this?

haerdib commented 11 months ago

Yes, it is happening with v0.17.1.

Steps to reproduce:

Run a Substate node (i'm doing it with docker):

  1. docker run -p 9944:9944 -p 9933:9933 -p 30333:30333 parity/substrate:latest --dev --rpc-external

  2. Run the api-client benchmark example. I have created two branches, one with v 0.16 (works), one with v0.17 (does not work).

Build and run the example with the cargo command: cargo run -p ac-examples --example benchmark_bulk_xt. The code in the example: https://github.com/scs/substrate-api-client/blob/bh/test-jsonrpsee/examples/examples/benchmark_bulk_xt.rs I've tried to remove the clutter around the jsonrpsee client as much as possible. The relevant part (I believe) is now here:

while nonce < first_nonce + 60 {
        // Create the extrinsic.
        let xt = api.compose_extrinsic_offline(call.clone(), nonce);
        let xt_bytes: Bytes = xt.encode().into();
        let hex_encoded_xt = rpc_params![xt_bytes];
        println!("Sending extrinsic with nonce {}", nonce);

        // Send the extrinsic with jsonrpsee
        let _xt_hash: H256 =
            api.client().request("author_submitExtrinsic", hex_encoded_xt).unwrap();

        nonce += 1;
    }

Where the api.client().request() is a block_on wrapper to make it a synchronous call. Relevant code part: https://github.com/scs/substrate-api-client/blob/bh/test-jsonrpsee/src/rpc/jsonrpsee_client/mod.rs#L68-L73

impl Request for JsonrpseeClient {
    fn request<R: DeserializeOwned>(&self, method: &str, params: RpcParams) -> Result<R> {
        block_on(self.inner.request(method, RpcParamsWrapper(params)))
            .map_err(|e| Error::Client(Box::new(e)))
    }
}
niklasad1 commented 11 months ago

It's a bad practice to mix tokio with another async runtime such as futures::executor::block_on. Most likely it's really just that causes this to get stuck, see https://github.com/tokio-rs/tokio/issues/2603 and the reason is that jsonrpsee migrated from futures-channels to tokio-channels which Client::request and Client::subscribe awaits on..

EDIT: I tried your example and I was able to verify by printlns that the message was received by client and sent out on the oneshot channel but awaiting on the future doesn't return.

Thus, I'm quite certain that this is because jsonrpsee moved away from futures channels to tokio channels.

If you could refactor your sync API to something like then I think it should work as v0.16:

#[maybe_async::sync_impl]
impl Request for JsonrpseeClient {
    fn request<R: DeserializeOwned>(&self, method: &str, params: RpcParams) -> Result<R> {
                let (tx, rx) = tokio::sync::oneshot();

                tokio::spawn(async move {
                    let res = self.inner.request(method, RpcParamsWrapper(params))).await.map_err(|e| Error::Client(Box::new(e)));
                    tx.send(res);
                });

        rx.blocking_recv().unwrap()
    }
}
niklasad1 commented 11 months ago

Yeah, ok I tested by switching out tokio::sync::oneshot to futures_channel::oneshot and then this issue goes away... So indeed, futures::executor::block_on messes up tokio somehow.

//cc @jsdw @lexnv I don't regard this as jsonrpsee issue as the library itself is tightly-coupled to tokio but do think we should provide an sync API for making clients requests?

haerdib commented 11 months ago

Thank you very much for the investigation! I indeed did not know this could be a problem. Time to switch out futures completely.. As this is clearly an issue on our side, I'm closing this issue. And ofc, we would welcome a sync API ;)

jsdw commented 11 months ago

Re a sync APIs; I woudln't mind adding them in theory if:

a) It wasn't too complex to do. b) Users of the sync APIs didn't explicitly need to know about tokio at all.

The above code is a bit of an exception though in that it runs in a tokio runtime (#[tokio::main]), but then blocks on this call instead of using tokio, which I can only imagine leads to some task being scheduled on a tokio worker thread that already has this blocking task running on that doesn't yield to let the it progress and ends up deadlocking itself, or something like that.

That all said, we don't have any need for sync APIs internally, and so it's not the highest priority for us really.