sigp / lighthouse

Ethereum consensus client in Rust
https://lighthouse.sigmaprime.io/
Apache License 2.0
2.97k stars 769 forks source link

Getting `404 not found` errors for missed blocks instead of a specific missed error code #4904

Open wangwzhou opened 1 year ago

wangwzhou commented 1 year ago

Description

When I fetch the slot data of a missed slot with eth/v2/beacon/blocks and eth/v1/beacon/headers, the response is 404 Not Found, instead of a specific error indicating this is a missed block, which makes it difficult to differentiate the following 2 cases:

  1. Cannot fetch slot data because it's a missed slot.
  2. Cannot fetch slot data because nodes are out of sync, or the slot hasn't been proposed yet.

Wondering should client return a specific error for missed slot?

Version

Lighthouse/v4.5.0-441fc16/x86_64-linux

Present Behaviour

When fetching the slot data of a missed slot with eth/v2/beacon/blocks and eth/v1/beacon/headers, the response is 404 Not Found.

For example, when fetch block header for https://holesky.beaconcha.in/slot/1, the response is

{ "code": 404, "message": "NOT_FOUND: beacon block at slot 1", "stacktraces": [] }

Expected Behaviour

Should client return a specific error/error code for missed slots?

Steps to resolve

Please describe the steps required to resolve this issue, if known.

michaelsproul commented 1 year ago

I agree this can be a bit confusing, but the spec mandates a 404 in the case where a block is skipped

michaelsproul commented 1 year ago

Some options:

wangwzhou commented 1 year ago

IMO, as long as we can differentiate skipped blocks with other not_found cases should be fine, we can either return with a specific error code or even keep 404 but with a specific error message.

michaelsproul commented 10 months ago

I discussed this with the other beacon-APIs maintainers and we didn't yet reach a conclusion on how to proceed, although we agreed that something should be done.

Jacek (@arnetheduck) from Nimbus pointed out that it would be good to be able to differentiate these cases by status code:

At the moment they are all 404s.

The problem is we couldn't agree on which status codes to use, as none from the HTTP spec are quite right. E.g.

There was also a desire by @rolfyone to avoid 5xx status codes, which is fair, seeing as they usually indicate bugs. That said, I think 503 Service Unavailable could be useful for the "I don't know about this slot, try again later" because its descriptions matches quite closely:

The HyperText Transfer Protocol (HTTP) 503 Service Unavailable server error response code indicates that the server is not ready to handle the request.

Common causes are a server that is down for maintenance or that is overloaded. This response should be used for temporary conditions and the Retry-After HTTP header should, if possible, contain the estimated time for the recovery of the service.

(emphasis added)

As a middle-ground I'm now proposing:

This still combines two cases with a 404 status, but is still an improvement over the current state of things. The argument would be that callers should be able to rule out the non-existence of the route by checking their software version or by reading the error message returned (assuming they can get at it under layers of reverse proxies and similar).

arnetheduck commented 10 months ago

503 wfm, great if you can get that upstreamed ;)

for "slot known to be empty" I actually think 200 is the right code - with an explicit response that indicates an empty slot.

michaelsproul commented 10 months ago

Ooh, a 200 with an empty body and a header that indicates it was intentional? 🤔 Thinking of something that works for JSON & SSZ, but maybe for JSON a response like this would be better:

{
   "data": null,
}
rolfyone commented 10 months ago

out of band we discussed the potential application of 204 with the metadata in headers, which may be appropriate... just transferring this part of that conversation, because there were a lot of ideas but this one seemed to have a level of consensus...

So really

It's still reasonable to think that something beyond the head slot may actually result in an error (such as 400) because it's a future slot and just basically a bad request... Anyway, I figured my initial response was probably worth giving a little more of a breakdown to.