streamingfast / substreams

Powerful Blockchain streaming data engine, based on StreamingFast Firehose technology.
Apache License 2.0
159 stars 45 forks source link

Rust client error #277

Closed abourget closed 10 months ago

abourget commented 1 year ago

Some tokio-based Rust clients hit this issue:

Unknown, message: "h2 protocol error: error reading a body from connection: unexpected end of file", details: [], metadata: MetadataMap { headers: {} }
maoueh commented 1 year ago

It seems that Rust client via tonic re-connects much more often than it should normally. From the backend perspective, it seems the server sees the client disconnecting (I don't have the logs at hand).

More digging is required inside tonic/hyper and debugging HTTP2 low level communication issue because that’s where I think the problem is, could still be a server misconfiguration creating weird behavior with tonic, not clear.

Reproduction was possible with our https://github.com/streamingfast/substreams-sink-rust code using something like SUBSTREAMS_API_TOKEN="<StreamingFast API Token>" cargo run -- https://mainnet.eth.streamingfast.io:443 https://github.com/streamingfast/substreams-eth-block-meta/releases/download/v0.5.1/substreams-eth-block-meta-v0.5.1.spkg db_out.

But I tried just now and I wasn't able to reproduce it, it stayed connected for a few minutes without problem (while when I reproduced it last time, it was re-connecting at least once each minute with the error).

So this might prove much harder to reproduce. Could be a "normal" error just badly flagged from tonic. Re-connection is normal when using any streaming technology, just not at the rate I was seeing.

Seeing it a few times per day could be normal behavior (assuming tonic badly report drop connections) but if it repeats many times per 15 minutes, there is something weird going on.

andschneider commented 1 year ago

@maoueh I can reproduce consistently with the substream-sink-rust project if I change the start block here to something recent. I'm curious if you can reproduce it this way? I tested with block 17843000 (finalized ~3 days ago at time of this comment)

beeb commented 1 year ago

Same issue here

alexLizhenyu commented 1 year ago

Same issue

D-J-Harris commented 1 year ago

Getting the same here - it appears to be dependent on the start/end blocks I am passing in (for a given substreams package the client is consuming)

I also see the module progresses happening, just after a short while the error will appear. If I limit my block range close to a multiple of 1000, it can complete before it has a chance to error

maoueh commented 1 year ago

Removing http2_adaptive_window when configuring the Tonic client seems to help some people. The problem, debugged by someone on Discord seems to be related to too many pings being sent by the client which is refused by the server. The error message you then receive is not really clear on that matters.

See https://github.com/streamingfast/substreams-sink-rust/commit/4581d02e8d77edaede200201f9b74b587aa6806d. Please report if this helps anybody here.

maoueh commented 1 year ago

Since the problem is with too many pings and flow control, if there is data consistently flowing, there will be no pings sent and the problem doesn't exhibits.

It happens much more frequently when the stream has nothing in it because we are waiting on Substreams endpoint to finally produce a batch of N blocks.

beeb commented 1 year ago

@maoueh this seems to help indeed

andschneider commented 1 year ago

@maoueh confirming this seems to fix it for me too

maoueh commented 1 year ago

I'll try to come up with the step to enable http2 layer debugging (hyper here). Good it helps, further tweaks could be needed, some digging will be required to correctly understand the consequences.