near / nearcore

Reference client for NEAR Protocol
https://near.org
GNU General Public License v3.0
2.31k stars 615 forks source link

Consider returning http errors for certain rpc errors #11792

Closed bowenwang1996 closed 1 month ago

bowenwang1996 commented 2 months ago

Currently when a rpc request is processed, we never return a http error. However, this makes it confusing to rpc operators, who may see very low "error rate" in their monitoring when there may actually be a nontrivial amount of requests that return a timeout error with a 200 response code. We may want to consider returning a HttpError in that function for certain rpc errors like timeout. Of course there are legitimate errors such as "account not found" that make sense to be included in a response body with a 200.

frol commented 1 month ago

Technically speaking, JSON RPC doesn't use HTTP status codes to indicate the result (at the very least, it is due to the fact that JSON RPC supports requests batching): https://www.jsonrpc.org/specification

@bowenwang1996 by timeout response do you refer to tx-status, broadcast-tx timeouts? They are also part of regular operations flow, aren't they?

I reopened this issue due to https://github.com/near/nearcore/pull/11806#discussion_r1685473176

frol commented 1 month ago

@bowenwang1996 FYI, changing the HTTP status codes will be a semi-breaking change. For example, non-ok response will be now handled here in near-api-js instead of further down the line. This means that all those legacy projects will do more retries on Timeout - one exponentionBackoff from fetchJson and another one from sendJsonRpc (it is is calling fetchJson)

bowenwang1996 commented 1 month ago

@bowenwang1996 by timeout response do you refer to tx-status, broadcast-tx timeouts? They are also part of regular operations flow, aren't they?

I think it mostly indicates an error. This would happen either because the rpc node is falling behind, or because the network is congested. In either case it seems that an error makes sense. Do you have suggestions regarding how to distinguish between errors?