libp2p / js-libp2p

The JavaScript Implementation of libp2p networking stack.
https://libp2p.io
Other
2.3k stars 442 forks source link

`.stop()` sometimes hangs indefinitely #2625

Open marcus-pousette opened 2 months ago

marcus-pousette commented 2 months ago

Severity:

High

Description:

Steps to reproduce the error:

I have a test scenario with a few nodes that I connect to each other and send some data

Sometimes when terminating a node it endlessly waits for the Websocket transport to shutdown. This is because there are connections that does close (?) and in the it-ws library there are no calls to server.closeAllConnections() before close.

I have managed to make the shutdown work better if modify https://github.com/libp2p/js-libp2p/blob/73f2b6b6d767492f8f1d740fac382a2d22c3fca1/packages/libp2p/src/components.ts#L72 to terminate the components in the reverse order in sequentially... so I suspect there is some kind of race condition going on or leak of some sort

Also no issues if I use Tcp transport instead.

This is not a good description on how to reproduce the issue, because it is a bit random when it occurs. But wanted to create this issue if others have the same problem and will update the description if I have a isolated a good test

marcus-pousette commented 2 months ago

When the stop timeouts, I have one connection left with 0 streams, remoteAddr: '/ip6/::ffff:7f00:1/tcp/56327/p2p/12D3KooWApN6mEB5666wTpqpzkgcKXC7kgMxmm7CAWRiD3WAg4Yv' direction: 'inbound' encryption: '/noise' multiplexer: '/yamux/1.0.0' transient: false

marcus-pousette commented 2 months ago

Closing the connectionManager component before all other components seems also to fix the issue


async _invokeStartableMethod(methodName) {
        const timeout = new Promise((resolve, reject) => {
            setTimeout(() => {
                reject('timeout')
            }, 10000);
        });
        const connectionManager = this.components["connectionManager"]
        if (isStartable(connectionManager)) {
            await connectionManager[methodName]?.();
        }
        let promise = Promise.all(Object.values(this.components)
            .filter(obj => isStartable(obj))
            .map(async (startable) => {
                await startable[methodName]?.();
            }));
        const race = Promise.race([promise, timeout]);
        try {
            await race;
        }
        catch (e) {
            throw e;
        }
        finally {
            clearTimeout(timeout);
        }
    }
achingbrain commented 2 months ago

in the it-ws library there are no calls to server.closeAllConnections() before close.

I may be missing something but where is the closeAllConnections method defined? The it-ws module stores a reference to a http/https net.Server which has no such method.

no issues if I use Tcp transport

Looking at the code, the TCP listener aborts each connection on close whereas the WebSocket transport tries to close them gracefully.

Do you still see the issue if the transport aborts the connections instead of closing them on close?

marcus-pousette commented 2 months ago

http.Server and https.Server has that method since NodeJs 18.2.0

/// it-ws/server.ts

 async close (): Promise<void> {
    await new Promise<void>((resolve, reject) => {
      this.server.closeAllConnections()
      this.server.close((err) => {
        if (err != null) {
          reject(err); return
        }

        resolve()
      })
    })
  }
marcus-pousette commented 2 months ago

Actually the closeAllConnections trick does work since the connection is a tcp one (?)

The close connection manager first trick seems to always work.

Also adding like a 5s delay before stopping also seems to work.

It feels like there is some kind of connection creation in process that is not caught correctly on closing

achingbrain commented 2 months ago

http.Server and https.Server has that method since NodeJs 18.2.0

Aha, that's interesting, nice find.

The docs do say though:

This does not destroy sockets upgraded to a different protocol, such as WebSocket or HTTP/2

...so I wonder if there's something else going on here?

marcus-pousette commented 2 months ago

Ye the whole thing about aborting Websocket connections before closing does not seem to have an impact. Only the two things I mentioned in my last comment did mitigate the problem for me

marcus-pousette commented 2 months ago

Looking at the code, the TCP listener aborts each connection on close whereas the WebSocket transport tries to close them gracefully.

That would explain something!