status-im / nim-chronos

Chronos - An efficient library for asynchronous programming
https://status-im.github.io/nim-chronos/docs/chronos
Apache License 2.0
353 stars 51 forks source link

`poll` reentrancy may trigger network callbacks multiple times #308

Closed etan-status closed 2 years ago

etan-status commented 2 years ago

During poll, expired timers and pending network events are collected into loop.callbacks, callback functions waiting for the events are scheduled, and then also executed (processCallbacks() in asyncloop.nim). When such a callback then proceeds to call waitFor, it re-enters poll while processCallbacks() is already ongoing.

If such a reentrancy condition occurs, loop.callbacks may still have items left. If they correspond to pending network events that have not yet been processed, the recursive poll call will re-discover that those events are still pending (loop.selector.selectInto). This then leads to network event callbacks being scheduled multiple times.

Certain network event callbacks do not operate properly when the underlying network event reflects a condition that is no longer accurate. For example, in transports/stream.nim, the function readStreamLoop expects to be called only when the underlying transp.buffer has space available, and crashes with an IndexError when it is called while that buffer is already full (because addr transp.buffer[transp.offset] on transp.offset == transp.buffer.len is invalid).

vendor/nim-chronos/chronos/asyncloop.nim(1124) waitFor
vendor/nim-chronos/chronos/asyncloop.nim(286) poll
vendor/nim-chronos/chronos/transports/stream.nim(1376) readStreamLoop
vendor/nimbus-build-system/vendor/Nim/lib/system/chcks.nim(23) raiseIndexError2
vendor/nimbus-build-system/vendor/Nim/lib/system/fatal.nim(50) sysFatal
Unhandled exception: index 4096 not in 0 .. 4095 [IndexError]

This condition can be experienced, for example using asyncTest in network heavy tests that operate on large amounts of data. It occurs from time to time on CI for nimbus-eth2.

macOS is affected for sure (probably Unix in general). I have not reviewed the Windows implementation.