sigp / lighthouse

Ethereum consensus client in Rust
https://lighthouse.sigmaprime.io/
Apache License 2.0
2.88k stars 724 forks source link

Limit RPC Request Concurrency and Slow Responses #2742

Open AgeManning opened 2 years ago

AgeManning commented 2 years ago

Description

We have an internal rate limit scheme for RPC requests which is quite generous. Peers that exceed these limits are sent an RPCErrorResponse informing them they have been rate limited and providing a timeout until they can re-request.

https://github.com/ethereum/consensus-specs/pull/2690 introduces a max RPC request concurrency value which we can further restrict and penalise peers on if they go beyond this limit.

As our responses are not spec'd it makes sense that we withhold responses rather than sending RPCErrorResponses with rate limits if we can reasonably process the request within the RPC response timeout interval and still satisfy our RPC rate limits. i.e if a peer hits our rate limit, and we would otherwise request them to re-request sometime within 10 seconds, we wait that time, process the request and then respond (imitating the action of peer waiting and re-requesting internally). If the peer is required to wait more than the 10 seconds (RPC timeout) we return our usual RPCErrorResponse as we should do if the peer attempts to send more than the new currency limit requests at a time.

pawanjay176 commented 2 years ago

Interested in picking this up.

I suppose a simple way to do this would be to add the request to a DelayQueue for later processing if the number of requests has exceeded MAXIMUM_CONCURRENT_REQUESTS and the timeout is < RPC_TIMEOUT. We could directly incorporate this in the RPCRateLimiter.

AgeManning commented 2 years ago

I think if MAXIMUM_CONCURRENT_REQUESTS is exceeded, they have violated the spec and we should either kick them or penalize them.

Implementation wise, I think you're right. We could add something minimal to the RPC behaviour, which includes the rate limiter (which already tells us the time it needs to wait).

I'd have a chat with @divagant-martian about best implementation paths for this.