sigp / lighthouse

Ethereum consensus client in Rust
https://lighthouse.sigmaprime.io/
Apache License 2.0
2.83k stars 701 forks source link

RPC TTFB On RPC Response #5676

Open AgeManning opened 2 months ago

AgeManning commented 2 months ago

Description

The specs say that when waiting for a response to an RPC request, we should initially wait 5 seconds for the first byte to arrive, (Time to First Byte, TTFB). This will speed up RPC failures for absent peers.

Although this is implemented in establishing streams, we currently only wait the total RPC request timeout when awaiting for a response.

This issue is to terminate requests (with a timeout error) if the first byte does not arrive in FFTB seconds.

Implementation

There are a few things I'd imagine that need to change in order to handle this.

Firstly, the section of code that handles the streams and more specifically a request waiting for a response is here: https://github.com/sigp/lighthouse/blob/stable/beacon_node/lighthouse_network/src/rpc/handler.rs#L665

When this future is ready, it indicates we have received a new chunk from the RPC request. When we do, if there are more RPC chunks, we increase the RPC timeout window and await a new chunk: https://github.com/sigp/lighthouse/blob/stable/beacon_node/lighthouse_network/src/rpc/handler.rs#L686

We would have to change this logic to incorporate the first byte for each RPC chunk. A response chunk consists of the following:

response_chunk  ::= <result> | <encoding-dependent-header> | <encoded-payload>

So, we should update our encoder to respond with the first byte, which is the result byte. We should only check this on the first chunk, all remaining chunks can have the usual RPC response timeout.

One way to implement this would be to add a new variant to the RpcCodedResponse: https://github.com/sigp/lighthouse/blob/stable/beacon_node/lighthouse_network/src/rpc/methods.rs#L422

something like, ResultByte(u8). Then update the encoder to return this variant when the first byte has been read, see here: https://github.com/sigp/lighthouse/blob/stable/beacon_node/lighthouse_network/src/rpc/codec/base.rs#L147

We could respond with the first byte so that the handler knows the time it has taken. If it takes more than TTFB to get this first byte, then close the stream.

1010adigupta commented 2 months ago

hey @ackintosh can I work with you on this issue, want to delve deeper in the networking stack