status-im / nim-chronos

Chronos - An efficient library for asynchronous programming
https://status-im.github.io/nim-chronos/docs/chronos
Apache License 2.0
352 stars 51 forks source link

avoid double `GC_unref` when final `boundstream.transfer` is cancelled #509

Closed etan-status closed 4 months ago

etan-status commented 4 months ago

When a BoundedStreamReader finishes, it performs one final transfer to signal EOF. If the other end subsequently closes the stream, state changes from Finished to Closing and enqueues a callback that will finish the close operation. If the transfer is then cancelled, the state changes from Closing to Error. The stream will eventually be closed, so changes again from Error to Closing, and the cleanup callback is scheduled a second time. Each of the cleanup callbacks then proceeds to GC_unref to udata before changing the state to Closed. The double GC_unref only happens if udata is used.

In practice, the buggy flow is triggered when nimbus-eth2 interacts with Geth, and Geth provides a response with Content-Length. This is the case for example when a block without transactions is fetched; otherwise, it uses Transfer-Encoding: chunked. Luckily, that setup does not configure udata, so there is no double free in practice. But, other applications that use udata and use the cancellation flow that triggers the bug may be affected.

cheatfate commented 4 months ago

If stream is Finished it should not change state to Error, it should stay Finished. This is list of state machine transitions for AsyncStreams.

  1. Running -> Error -> Closing -> Closed.
  2. Running -> Stopped -> Closing -> Closed.
  3. Running -> Finished -> Closing -> Closed.
  4. Running -> Closing -> Closed.
cheatfate commented 4 months ago

It still an issue but it should be done in different way, e.g. await noCancel rstream.buffer.transfer().

etan-status commented 4 months ago

If you put a doAssert in the CancelledError handler of the final transfer, that state == Finished, you'll see that it gets hit when running Nimbus BN.

as in, it's case (3), but in-between Closing -> Closed, someone cancels, resulting in Closing -> Error -> Closing -> Closed -> Closed. The Closed state is hit twice, as both times Closing is entered it enqueues the transition to Closed.

etan-status commented 4 months ago

OK, yeah, it could be that there is a better way to fix it. I'll close this one, as I don't know the intricacies there. But the state transition should be fixed, to disallow Closing --> Error.