paritytech / litep2p

Peer-to-peer networking library
MIT License
74 stars 10 forks source link

transport-manager: Unknown connection opened #172

Closed lexnv closed 3 months ago

lexnv commented 3 months ago

Investigate the following errors:

    "2024-07-06 04:53:54.449  WARN tokio-runtime-worker litep2p::transport-manager: unknown connection opened as secondary connection, discarding peer=PeerId(\"12D3KooWB9DVNyi6TQUhHeyq4LyBLVykybXUmcNmEUD9tcKEaYq1\") connection_id=ConnectionId(246958) address=\"/ip6/*/tcp/51326/ws/p2p/12D3KooWB9DVNyi6TQUhHeyq4LyBLVykybXUmcNmEUD9tcKEaYq1\" dial_record=AddressRecord { score: 100, address: \"/ip4/1*/tcp/31334/ws/p2p/12D3KooWB9DVNyi6TQUhHeyq4LyBLVykybXUmcNmEUD9tcKEaYq1\", connection_id: Some(ConnectionId(246946)) }",
    "2024-07-06 04:53:54.450  WARN tokio-runtime-worker litep2p::transport-manager: non-primary connection was closed but secondary connection doesn't exist peer=PeerId(\"12D3KooWB9DVNyi6TQUhHeyq4LyBLVykybXUmcNmEUD9tcKEaYq1\") connection_id=ConnectionId(246958)",
    "2024-07-06 04:53:54.450 ERROR tokio-runtime-worker litep2p::transport-manager: failed to handle closed connection error=InvalidState",
    "2024-07-06 04:54:04.058  WARN tokio-runtime-worker litep2p::transport-manager: non-primary connection was closed but secondary connection doesn't exist peer=PeerId(\"12D3KooWB9DVNyi6TQUhHeyq4LyBLVykybXUmcNmEUD9tcKEaYq1\") connection_id=ConnectionId(246953)",
    "2024-07-06 04:54:04.058 ERROR tokio-runtime-worker litep2p::transport-manager: failed to handle closed connection error=InvalidState",

Data collected from a long running litep2p substrate node in kusama

### Tasks
- [ ] https://github.com/paritytech/litep2p/pull/176
- [ ] https://github.com/paritytech/litep2p/pull/179
altonen commented 3 months ago

The issue could be that after receiving an unknown unknown connection, TransportManager wants to discard it: https://github.com/paritytech/litep2p/blob/a27d0072913be67cce904e2e8c465cc37ec5223a/src/transport/manager/mod.rs#L980-L987

but that branch actually doesn't return early with ConnectionEstablishedResult::Reject but instead drops down to the default case at the end of the function which returns ConnectionEstablishedResult::Accept for all cases. This would explain why later on when the connection is closed, TransportManager doesn't recognize it.

lexnv commented 3 months ago

Its been a tricky edge-case, will keep an eye for triaging since we might have other minor gaps causing this state reset

Detailed repro case

// This is where the state was wrongfully overwritten

// The issue was detected here