holepunchto / hyperswarm

A distributed networking stack for connecting peers.
https://docs.holepunch.to
MIT License
1.06k stars 85 forks source link

handling "too few nodes responded" #103

Open heapwolf opened 2 years ago

heapwolf commented 2 years ago

When going offline i get the error Too few nodes responded from hyperswram (https://github.com/mafintosh/dht-rpc/blob/7527e8f83e7457750a0ece5469ed3b579c5799de/lib/query.js#L196). It seems like this could be caught by adding an error event up the stack, but im not sure what the intended idea was for handling this correctly. cc @mafintosh

Error: Too few nodes responded
    at Query._endAfterCommit (/Users/.../hyper-bundle:2552:24)
    at Query._flush (/Users/.../hyper-bundle.js:2548:14)
    at Query._readMore (/Users/.../hyper-bundle:2534:16)
    at Query._onerror (/Users/.../hyper-bundle:2623:14)
    at Object.destroy (/Users/.../hyper-bundle:1366:14)
    at Timeout.oncycle [as _onTimeout] (/Users/.../hyper-bundle:1462:13)
    at listOnTimeout (node:internal/timers:559:11)
    at processTimers (node:internal/timers:500:7)

There are two other issues this reference this without resolution.

mafintosh commented 2 years ago

Do you have a testcase to illustrate this?

Raynos commented 2 years ago

We use hyperswarm in our desktop application at startup.

    this.swarm = new Hyperswarm({
      dht: new DHT({ ephemeral: true }),
      jitter: JITTER,
      keyPair: this.keyPair,
      backoffs: [
        BACKOFF_A + Math.round(JITTER * Math.random()),
        BACKOFF_B + Math.round(JITTER * Math.random()),
        BACKOFF_C + Math.round(JITTER * Math.random())
      ]
    })
    const discovery = this.discovery = await this.swarm.join(this.peerId, {
      server: true,
      client: false
    })

    await discovery.flushed()
    this.swarm.flush()

If the user is currently offline or in airplane mode or not connected to wifi or has spotty wifi then the hyperswarm module raises this too few nodes responded error.

We would ideally want hyperswarm to be graceful and retry to see if an internet connection is available ( with backoff ).

mafintosh commented 2 years ago

swarm.join is sync. swarm.flush is not. Are you sure it's not that one that or .flushed that is throwing? It already recovers in the background.

heapwolf commented 2 years ago

I think that's just code we were fiddling around with trying to figure out this error, not sure if await on join is in the actual codebase, but flush, if I recall, can just hang for a long time if you await it (I could be misremembering) cc @jwerle

mafintosh commented 2 years ago

flush can throw. You are meant to handle that. If you’re running it in the background, you need to noop the catch or node will crash. Down to make it return a bool instead of throwing if the pending ops fail if that’s easier for users.

jwerle commented 2 years ago

flush can throw. You are meant to handle that. If you’re running it in the background, you need to noop the catch or node will crash. Down to make it return a bool instead of throwing if the pending ops fail if that’s easier for users.

good to know!

jwerle commented 2 years ago

I think that's just code we were fiddling around with trying to figure out this error, not sure if await on join is in the actual codebase, but flush, if I recall, can just hang for a long time if you await it (I could be misremembering) cc @jwerle

I am destroying the swarm somewhere that causes this https://github.com/hyperswarm/hyperswarm/blob/master/index.js#L352

heapwolf commented 2 years ago

flush can throw. You are meant to handle that. If you’re running it in the background, you need to noop the catch or node will crash. Down to make it return a bool instead of throwing if the pending ops fail if that’s easier for users.

+1 for making it bool. This could prevent a lot of people from being surprised or forgetting to handle it.