Open marten-seemann opened 1 year ago
We've been using a few error codes without having them defined beforehand. This probably means that we need to consider the error codes used there as "burnt". @elenaf9, can you compile a list of application error codes that QUIC in rust-libp2p has been using so far? Fortunately, this only applies to connection-level error codes, and there are lots of numbers in the uint62 space, so this is nothing more than a mild annoyance.
In the following, here's a proposal for error codes to add. Numerical values are still TBD, the purpose of this exercise is to agree on a list of possible errors (and the names) first.
Error Code | Description |
---|---|
REJECTED | Connection rejected because the node is temporarily overloaded. Most likely because some accept queue ran full. |
GATED | Connection rejected because the connection was gated. Most likely the IP / node is blacklisted. |
RESOURCE_ALLOCATION_FAILED | Connection rejected because we ran into a resource limit. |
PROTOCOL_NEGOTIATION_FAILED | Connection rejected because we couldn't negotiate a protocol. Note that this error code can not be sent reliably, as we don't have the option to send custom error code during every part of the handshake. |
DIAL_CANCELED | Multiple connections were raced in parallel. This connection is closed because another connection won the race. Note that this can also happen to newly established connections shortly after the handshake. |
GARBAGE_COLLECTED | The connection was garbage collected. |
SHUTDOWN | The node is going down. |
CLOSE | The user closed the connection explicitly. |
PROTOCOL_VIOLATION | The peer violated the protocol. |
Error Code | Description |
---|---|
REJECTED | Stream rejected because the node is temporarily overloaded. Most likely because some accept queue ran full. |
RESOURCE_ALLOCATION_FAILED | Connection rejected because we ran into a resource limit. |
PROTOCOL_NEGOTIATION_FAILED | Connection rejected because we couldn't negotiate a protocol. |
In addition to these libp2p error codes, every protocol will probably want to register their own error codes. For that, it would be really, really nice if we had more than 256 error code values at our disposal.
WebTransport: limits stream reset error codes to 8 bits
This is fixed in the latest drafts. From: https://www.ietf.org/archive/id/draft-ietf-webtrans-http3-09.html#name-resetting-data-streams
Since WebTransport shares the error code space with HTTP/3, WebTransport application errors for streams are limited to an unsigned 32-bit integer, assuming values between 0x00000000 and 0xffffffff.
We can ignore webtransport for now and wait for browsers to upgrade to >= draft-9
It would be really helpful to know why a peer closed a connection or reset a stream. Unfortunately, we currently don’t have access to that information.
Here’s a proposal how to convey that piece of information.
Connection Termination
Current situation:
It seems straightforward to use a 32 bit error code space for libp2p. If we decide that transmitting an error message is important, we might be able to find a backwards-compatible yamux hack, similar to the one described in the next section.
We could have different error codes for: connections that are closed because they were dial-raced with other connections, disallowed by a connection gater, closed due to resource limitations, closed to make room for more valuable connections, closed for different kinds of protocol violations, etc.
Caveat: With TCP linger set to 0, the TCP connection is reset instead of properly closed. This also means that the error code might not be transmitted reliably.
Stream Termination
Current situation:
It seems like we’re therefore limit to 256 error codes. We’d need to reserve a subset of these for libp2p itself (for example, we need to convey that multistream negotiation failed, or that we didn’t even start multistream negotiation because of resource limits, etc.). The rest of the error codes would be defined by the application.
yamux hack
Depending on how current implementations handle this, we could either: