libp2p / specs

Technical specifications for the libp2p networking stack
https://libp2p.io
1.56k stars 273 forks source link

Proposal: provide error codes when closing connections and resetting streams #479

Open marten-seemann opened 1 year ago

marten-seemann commented 1 year ago
### Related PRs
- [ ] #623 
- [ ] #622 

It would be really helpful to know why a peer closed a connection or reset a stream. Unfortunately, we currently don’t have access to that information.

Here’s a proposal how to convey that piece of information.

Connection Termination

Current situation:

It seems straightforward to use a 32 bit error code space for libp2p. If we decide that transmitting an error message is important, we might be able to find a backwards-compatible yamux hack, similar to the one described in the next section.

We could have different error codes for: connections that are closed because they were dial-raced with other connections, disallowed by a connection gater, closed due to resource limitations, closed to make room for more valuable connections, closed for different kinds of protocol violations, etc.

Caveat: With TCP linger set to 0, the TCP connection is reset instead of properly closed. This also means that the error code might not be transmitted reliably.

Stream Termination

Current situation:

It seems like we’re therefore limit to 256 error codes. We’d need to reserve a subset of these for libp2p itself (for example, we need to convey that multistream negotiation failed, or that we didn’t even start multistream negotiation because of resource limits, etc.). The rest of the error codes would be defined by the application.

yamux hack

Depending on how current implementations handle this, we could either:

marten-seemann commented 1 year ago

We've been using a few error codes without having them defined beforehand. This probably means that we need to consider the error codes used there as "burnt". @elenaf9, can you compile a list of application error codes that QUIC in rust-libp2p has been using so far? Fortunately, this only applies to connection-level error codes, and there are lots of numbers in the uint62 space, so this is nothing more than a mild annoyance.

In the following, here's a proposal for error codes to add. Numerical values are still TBD, the purpose of this exercise is to agree on a list of possible errors (and the names) first.

Connection Error Codes

Error Code Description
REJECTED Connection rejected because the node is temporarily overloaded. Most likely because some accept queue ran full.
GATED Connection rejected because the connection was gated. Most likely the IP / node is blacklisted.
RESOURCE_ALLOCATION_FAILED Connection rejected because we ran into a resource limit.
PROTOCOL_NEGOTIATION_FAILED Connection rejected because we couldn't negotiate a protocol. Note that this error code can not be sent reliably, as we don't have the option to send custom error code during every part of the handshake.
DIAL_CANCELED Multiple connections were raced in parallel. This connection is closed because another connection won the race. Note that this can also happen to newly established connections shortly after the handshake.
GARBAGE_COLLECTED The connection was garbage collected.
SHUTDOWN The node is going down.
CLOSE The user closed the connection explicitly.
PROTOCOL_VIOLATION The peer violated the protocol.

Stream Error Codes

Error Code Description
REJECTED Stream rejected because the node is temporarily overloaded. Most likely because some accept queue ran full.
RESOURCE_ALLOCATION_FAILED Connection rejected because we ran into a resource limit.
PROTOCOL_NEGOTIATION_FAILED Connection rejected because we couldn't negotiate a protocol.

In addition to these libp2p error codes, every protocol will probably want to register their own error codes. For that, it would be really, really nice if we had more than 256 error code values at our disposal.

sukunrt commented 2 months ago

WebTransport: limits stream reset error codes to 8 bits

This is fixed in the latest drafts. From: https://www.ietf.org/archive/id/draft-ietf-webtrans-http3-09.html#name-resetting-data-streams

Since WebTransport shares the error code space with HTTP/3, WebTransport application errors for streams are limited to an unsigned 32-bit integer, assuming values between 0x00000000 and 0xffffffff.

We can ignore webtransport for now and wait for browsers to upgrade to >= draft-9