nbd-wtf / nostr-tools

Tools for developing Nostr clients.
The Unlicense
695 stars 194 forks source link

Websocket connection failure via pool results in uncaught exception #130

Closed chmac closed 1 year ago

chmac commented 1 year ago

Steps to reproduce

I think the problem is that there's no await here:

https://github.com/nbd-wtf/nostr-tools/blob/f43d23d344fe572b1c9a0a0a7fdfa0c4389640c2/pool.ts#L56-L59

Well, to be clearer, the async function that's invoked by the .forEach() is not awaited anywhere. It's just started. Maybe the solution would be to change .forEach() to a .map() and wrap the output in a Promise.all()? But, that would mean that failure to connect to any single relay would crash the whole pool. I guess that's probably not what we want.

I've always found it challenging to handle errors on one of the relays in a group. I suppose ideally the errors would be made available to the client of the API in some way. But the current API wouldn't support anything like that.

Another approach would be to check if all relays have failed, and if they have, then fail the whole process. Perhaps instead of a Promise.all() then a Promise.race() would work for that...

chmac commented 1 year ago

Hmm, upon further investigation, the code block above is inside the pool's .sub() implementation, which is not async.

I think there's a fundamental API challenge here. The subscription API is synchronous, because it returns an event emitter. But, the pool implementation automatically connects to the relay. That means when calling the synchronous .sub(), there's no async context to await connection errors.

I think one consequence of all this is that if 1 relay fails to connect, the pool will always wait until the eose timeout of 2400ms before returning a value from .list(). Even if all the other relays have already delivered all their events.

chmac commented 1 year ago

I've tested wrapping the line 57 .ensureRelay() call in a try / catch, and then the error is caught. If I replace line 57 with the following, then I don't get any uncaught errors:

      let r
      try {
        r = await this.ensureRelay(relay);
      } catch(error) {
        eosesMissing--;
        if (eosesMissing === 0) {
          clearTimeout(eoseTimeout);
          for (let cb of eoseListeners.values())
            cb();
        }
        return
      }

I think that decrementing the eosesMissing count is crucial as this allows the .sub() function to return without waiting for the timeout. Edit: Plus calling the clearTimeout() and invoking the eoseListeners callbacks.

This still doesn't surface the issue to the API consumer though. There's no way for me to know from my app if any of the relays fail to connect.

chmac commented 1 year ago

@fiatjaf Now that's a fast fix! 🚀 🚀 🚀