status-im / nim-chronos

Chronos - An efficient library for asynchronous programming
https://status-im.github.io/nim-chronos/docs/chronos
Apache License 2.0
353 stars 51 forks source link

Closing many stream servers can crash on windows #320

Closed Menduist closed 1 year ago

Menduist commented 1 year ago

Running this branch (example included): https://github.com/status-im/nim-chronos/commit/bb121356bf97d5358aef6e0be813bcf5b88db5a5 on windows, results in:

[...]
Accept continuation
closed
Accept continuation
Accept continuation
[...]

It effectively means that the accept continuation can be called after closeWait finishes.

Before https://github.com/status-im/nim-chronos/pull/309, it started to happen with only 10 servers, now it needs ~190. Since closeWait has finished, the user may have deleted his references to the StreamServer, like we do in libp2p:

    for server in self.servers:
      server.stop()
      toWait.add(server.closeWait())

    await allFutures(toWait)

    self.servers = @[]

So when the accept continuation is called, the GC may have already cleaned the StreamServer, which results in segfault.

I also suspect that if the dispatcher is more busy, it can happen with far less servers

We only hit this on windows, though I didn't spend time trying to replicate on linux

cheatfate commented 1 year ago

I can't reproduce this issue even with 1000 iterations, so please confirm that fix is working.