vacp2p / nim-libp2p

libp2p implementation in Nim
https://vacp2p.github.io/nim-libp2p/docs/
MIT License
242 stars 52 forks source link

fix: sendMsg must not raise errors #1074

Closed diegomrsantos closed 3 months ago

diegomrsantos commented 3 months ago

This error making nimbus crash has been reported:

/home/seamonkey/Downloads/holesky/nimbus-eth2/vendor/nim-testutils/testutils/moduletests.nim(21) moduletests
/home/seamonkey/Downloads/holesky/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(2376) main
/home/seamonkey/Downloads/holesky/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(2299) handleStartUpCmd
/home/seamonkey/Downloads/holesky/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(2200) doRunBeaconNode
/home/seamonkey/Downloads/holesky/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(1957) start
/home/seamonkey/Downloads/holesky/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(1904) run
/home/seamonkey/Downloads/holesky/nimbus-eth2/vendor/nim-chronos/chronos/internal/asyncengine.nim(150) poll
Error: unhandled exception: Asynchronous task [sendMsgSlow() at pubsubpeer.nim:288] was cancelled! [FutureDefect]
arnetheduck commented 3 months ago

see https://github.com/vacp2p/nim-libp2p/pull/1075 - the problem is that stopping the non-priority send task cancels the sendMsg future even though the non-priority task is not the owner - catching exceptions will not help with this problem

diegomrsantos commented 3 months ago

@arnetheduck why cancelling the task would cancel the futures inside the queue if they aren't related?

arnetheduck commented 3 months ago

when you await a task in an async function (such as the low prio loop) and cancel the call to that async function, the cancellation propagates to the thing currently being awaited - in this case the queue item - race and allFutures "break" this chain of cancellations.

consider:

proc f() {.async.} = await sleepAsync(10.minutes)
proc g() {.async.} = await f()

let fut = g()
fut.cancelSoon()

here, you can clearly see that if you cancel fut, you want the cancellation to propagate all the way to sleepAsync so that the sleep is cancelled.

in the case of sendMsg, we store the future in a sequence and there are two things waiting for it - the asyncSpawn task and the non-priority loop - asyncSpawn is the "owner" of the future chain and therefore the only one that should cancel the future - the non-prio loop should not cancel it because it is not the owner.

diegomrsantos commented 3 months ago

Can we use https://github.com/status-im/nim-chronos/blob/e296ae30c84bdd1f0b12c50ab551ed080f8a815c/chronos/internal/asyncfutures.nim#L1029?

diegomrsantos commented 3 months ago

catching exceptions will not help with this problem

This PR won't prevent the Future from being canceled but will fix the crash.

arnetheduck commented 3 months ago

Can we use https://github.com/status-im/nim-chronos/blob/e296ae30c84bdd1f0b12c50ab551ed080f8a815c/chronos/internal/asyncfutures.nim#L1029?

no - noCancel blocks cancellation until the given future is finished "naturally" - in the sleep example, it would mean that the cancellation takes 10 minutes to complete.

This PR won't prevent the Future from being canceled but will fix the crash.

There's no mechanism right now that is able to prevent futures from finishing - the work @etan-status is doing allows for better discovery of where error and cancellation handling is needed however - as soon as the "initial" introduction is done, it will be easy to find where we need to add cancellation handling and where not

diegomrsantos commented 3 months ago

so should we close this PR?

arnetheduck commented 3 months ago

yes, I think it's not needed at the moment - raises, when introduced, will reveal the exact error handling needs in this function

diegomrsantos commented 3 months ago

closing this for now as it is superseded by https://github.com/vacp2p/nim-libp2p/pull/1075