libp2p / js-libp2p

The JavaScript Implementation of libp2p networking stack.
https://libp2p.io
Other
2.27k stars 435 forks source link

libp2p crashes with ABORT_ERR #2462

Closed christroutner closed 3 months ago

christroutner commented 3 months ago

Severity:

Description:

During normal operation of finding and connecting with nodes, the libp2p node will crash. This appears to be due to a race condition. Here is is the error message received from v1.3.1 (latest) version of libp2p:

status: getCRGist() Connecting to Circuit Relay /ip4/143.198.70.59/tcp/5101/p2p/12D3KooWMbU9R49aiYUeFBpxFYK6PggacoeMydaZaR2dzDpWgcA6
file:///home/trout/work/psf/code/helia-coord/node_modules/race-signal/dist/src/index.js:22
        return Promise.reject(new AbortError(opts?.errorMessage, opts?.errorCode));

AbortError: The operation was aborted
    at raceSignal (file:///home/trout/work/psf/code/helia-coord/node_modules/race-signal/dist/src/index.js:22:31)
    at YamuxStream.closeWrite (file:///home/trout/work/psf/code/helia-coord/node_modules/@libp2p/utils/dist/src/abstract-stream.js:230:19)
    at YamuxStream.close (file:///home/trout/work/psf/code/helia-coord/node_modules/@libp2p/utils/dist/src/abstract-stream.js:189:18)
    at stream.close (file:///home/trout/work/psf/code/helia-coord/node_modules/@libp2p/utils/dist/src/stream-to-ma-conn.js:13:15)
    at ConnectionImpl.close [as _close] (file:///home/trout/work/psf/code/helia-coord/node_modules/libp2p/dist/src/upgrader.js:443:30)
    at processTicksAndRejections (node:internal/process/task_queues:95:5)
    at runNextTicks (node:internal/process/task_queues:64:3)
    at listOnTimeout (node:internal/timers:540:9)
    at process.processTimers (node:internal/timers:514:7)
    at async ConnectionImpl.close (file:///home/trout/work/psf/code/helia-coord/node_modules/libp2p/dist/src/connection/index.js:121:13) {
  type: 'aborted',
  code: 'ABORT_ERR'
}

Node.js v20.11.0

This is a similar error from an older version of libp2p (v1.2.1):

file:///home/safeuser/ipfs-service-metrics/node_modules/race-signal/dist/src/index.js:22
        return Promise.reject(new AbortError(opts?.errorMessage, opts?.errorCode));
                              ^

AbortError: The operation was aborted
    at raceSignal (file:///home/safeuser/ipfs-service-metrics/node_modules/race-signal/dist/src/index.js:22:31)
    at YamuxStream.closeWrite (file:///home/safeuser/ipfs-service-metrics/node_modules/@libp2p/utils/dist/src/abstract-stream.js:230:19)
    at YamuxStream.close (file:///home/safeuser/ipfs-service-metrics/node_modules/@libp2p/utils/dist/src/abstract-stream.js:189:18)
    at file:///home/safeuser/ipfs-service-metrics/node_modules/libp2p/dist/src/connection/index.js:118:63
    at Array.map (<anonymous>)
    at ConnectionImpl.close (file:///home/safeuser/ipfs-service-metrics/node_modules/libp2p/dist/src/connection/index.js:118:44)
    at initiateConnection (file:///home/safeuser/ipfs-service-metrics/node_modules/@libp2p/webrtc/dist/src/private-to-private/initiate-connection.js:125:34)
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (node:internal/process/task_queues:96:5)
    at async WebRTCTransport.dial (file:///home/safeuser/ipfs-service-metrics/node_modules/@libp2p/webrtc/dist/src/private-to-private/transport.js:83:35)
    at async DefaultTransportManager.dial (file:///home/safeuser/ipfs-service-metrics/node_modules/libp2p/dist/src/transport-manager.js:81:20)
    at async Job.queue.add.peerId.peerId [as fn] (file:///home/safeuser/ipfs-service-metrics/node_modules/libp2p/dist/src/connection-manager/dial-queue.js:153:38)
    at async raceSignal (file:///home/safeuser/ipfs-service-metrics/node_modules/race-signal/dist/src/index.js:28:16)
    at async Job.run (file:///home/safeuser/ipfs-service-metrics/node_modules/@libp2p/utils/dist/src/queue/job.js:56:28) {
  type: 'aborted',
  code: 'ABORT_ERR'
}

Steps to reproduce the error:

This error can be reproduced by cloning the helia-coord library, deps-04-24 branch. Install dependencies, then run this javascript file with node.js. After a period of time, the error and crash will occur.

christroutner commented 3 months ago

Sometimes the libp2p app runs for over an hour without an issue, an other times this Issue occurs within a few seconds after startup. If someone is trying to reproduce this error, restart the app after 5 minutes if the error does not appear. It's a race condition, so it's not easy to reproduce. It appears to involve the connection between two nodes.

christroutner commented 3 months ago

In an attempt to debug the root cause, I reverted back to js-libp2p v1.2.1, from the latest v1.3.1. I started to see the same error as above. I realized what had changed is that I was using node.js v20 when I was previously using node.js v16.

I've been doing some testing with js-libp2p v1.3.1 and node.js v16. So far I have not see the error in this Issue.

christroutner commented 3 months ago

I'm closing this Issue as I think it's tied to some combination of switching node versions, node_modules, and package-lock.json.

I've successfully gotten the error to go away on node.js v16 on Ubuntu 22. And I've gotten it to run on node.js v20 on Ubutnu 20.

luizzvinicius commented 2 months ago

Guys, I'm having the same problem. Nowadays I'm using node 20.11, but in Windows and I tried return to v16 however the problem continues. To be more specific, I'm working with IPFS (helia) and orbitDb, this happens when the terminal reloads, in other words, when the application creates more than one connection (I believe this is the cause of the problem).

christroutner commented 2 months ago

If you haven't tried it yet, delete your node_modules folder and the package-lock.json file. Then reinstall dependencies with npm install. That seemed to make a difference for me. It not a silver bullet, but it was definitely one of the factors.