libp2p / js-libp2p

The JavaScript Implementation of libp2p networking stack.
https://libp2p.github.io/js-libp2p/
Other
2.33k stars 445 forks source link

Restarting a node quickly and connecting will not emit `onConnect`events in registrar #2378

Open marcus-pousette opened 9 months ago

marcus-pousette commented 9 months ago

Severity:

Low

Description:

I am trying to track down issues I have with connections and realised that I get unexpected behaviours when restarting nodes

Steps to reproduce the error:

Create a protocol handler with

this.components.registrar.register(multicodec, {
    onConnect: this.onPeerConnected.bind(this),
    onDisconnect: this.onPeerDisconnected.bind(this),
    notifyOnTransient: false
})

Expected behaviour onPeerConnected is emitted 2 times for both nodes.

Actual behaviour. onPeerConnected is never emitted on the last connect for any node. And the onPeerDisconnected is emitted late, after dialing has been resolved (looks like that the connection take a while to close (?))

Putting a delay between stopping and starting the node, or between starting and connecting, will solve this issue, but in a real world usecase where I refresh a tab in browser I suspect that this issue is causing the issues I am having.

marcus-pousette commented 9 months ago

This is what I get when I am starting up the node after closing

2024-01-25T23:54:38.968Z libp2p:transports creating listener for @libp2p/tcp on /ip4/127.0.0.1/tcp/0 2024-01-25T23:54:38.968Z libp2p:tcp:listener Listening on { address: '127.0.0.1', family: 4, port: 60420 } 2024-01-25T23:54:38.969Z libp2p:connection-manager dial 12D3KooWPQUiiod5RRfnRf2reXAP1mw5vM86m32atbXZuN3T49Yd 2024-01-25T23:54:38.969Z libp2p:connection-manager had an existing non-transient connection to 12D3KooWPQUiiod5RRfnRf2reXAP1mw5vM86m32atbXZuN3T49Yd 2024-01-25T23:54:38.969Z libp2p:tcp:socket 127.0.0.1:60410 socket end 2024-01-25T23:54:38.969Z libp2p:mplex initiator stream with id 1 and protocol /test/0.0.0 ended 2024-01-25T23:54:38.969Z libp2p:mplex receiver stream with id 1 and protocol /test/0.0.0 ended 2024-01-25T23:54:38.969Z libp2p:tcp:socket 127.0.0.1:60410 socket close 2024-01-25T23:54:38.969Z libp2p:connection:outbound:aynwkw1706226876843 closing connection to /ip4/127.0.0.1/tcp/60410/p2p/12D3KooWPQUiiod5RRfnRf2reXAP1mw5vM86m32atbXZuN3T49Yd 2024-01-25T23:54:38.970Z libp2p:tcp:socket 127.0.0.1:60410 socket was already destroyed when trying to close

It seems libp2p:connection-manager had an existing non-transient connection to 12D3KooWPQUiiod5RRfnRf2reXAP1mw5vM86m32atbXZuN3T49Yd is skipping doing some things because of this. I assume this creates consequences down the line (?)

I dont get it by putting a delay before start and connect. Then it will look like this

2024-01-26T00:01:02.345Z libp2p:transports creating listener for @libp2p/tcp on /ip4/127.0.0.1/tcp/0 2024-01-26T00:01:02.345Z libp2p:tcp:listener Listening on { address: '127.0.0.1', family: 4, port: 61966 } 2024-01-26T00:01:05.346Z libp2p:connection-manager dial 12D3KooWLNL49UuYwaSRh1fbE7Qj2CJwYj9KftHX2iBBR2m6eZJF 2024-01-26T00:01:05.347Z libp2p:connection-manager:dial-queue creating dial target for 12D3KooWLNL49UuYwaSRh1fbE7Qj2CJwYj9KftHX2iBBR2m6eZJF [ '/ip4/127.0.0.1/tcp/61966/p2p/12D3KooWLNL49UuYwaSRh1fbE7Qj2CJwYj9KftHX2iBBR2m6eZJF' ] 2024-01-26T00:01:05.348Z libp2p:tcp dialing /ip4/127.0.0.1/tcp/61966/p2p/12D3KooWLNL49UuYwaSRh1fbE7Qj2CJwYj9KftHX2iBBR2m6eZJF 2024-01-26T00:01:05.348Z libp2p:tcp connection opened /ip4/127.0.0.1/tcp/61966/p2p/12D3KooWLNL49UuYwaSRh1fbE7Qj2CJwYj9KftHX2iBBR2m6eZJF 2024-01-26T00:01:05.348Z libp2p:tcp new outbound connection /ip4/127.0.0.1/tcp/61966/p2p/12D3KooWLNL49UuYwaSRh1fbE7Qj2CJwYj9KftHX2iBBR2m6eZJF 2024-01-26T00:01:05.348Z libp2p:tcp:socket starting the outbound connection upgrade 2024-01-26T00:01:05.348Z libp2p:tcp:socket selecting outbound crypto protocol [ '/noise' ] 2024-01-26T00:01:05.348Z libp2p:tcp:listener new inbound connection /ip4/127.0.0.1/tcp/61980 2024-01-26T00:01:05.348Z libp2p:tcp:socket starting the inbound connection upgrade

marcus-pousette commented 9 months ago

It seems to matter which node is dialling whom.. If the node that is not closing is dialing the the other node. Then it will fail, but the other way around it works. I assume that the node that is not stopping have a connection lingering that ought to close, which makes dialling again not possible. Given this, this might not be a problematic issue after all because it is usually the node that is starting that is starting to dial.

But the only issue perhaps is that I received a ConnectionImpl object after dialing (from the "wrong" side), even if it was perhaps closing, which made this problem less easy to understand.

You might close this if this is expected behaviour