Closed MrNaif2018 closed 1 month ago
This should be fixed in aiorpcx. There is more discussion in https://github.com/kyuupichan/aiorpcX/pull/48 I think https://github.com/kyuupichan/aiorpcX/commit/e1c7a227f8886972c6d18f920cd5c39462e055d3 is sufficient.
(note: I was playing around a bit in https://github.com/spesmilo/electrum/compare/master...SomberNight:electrum:202403_asyncio_waitfor, e.g. see https://github.com/sombernight/electrum/commit/b388ec3d8caddcc66d69bcd32f6a59ebacf21b16)
By the way there are a few more timeout-related exceptions in aiorpcx which still inherit from CancelledError, they won't cause problems?
There is only TimeoutCancellationError
, isn't there?
I have tried and haven't managed to write a unit test that involved triggering something like this for that. Feel free to try.
It looks like wallet is still not synchronized due to synchronizer.is_up_to_date() being False. It's False once again due to self.requested_addrs being non-empty. So I think what happens is: we now get a timeout, but those addresses which were already requested but failed with a timeout are still left in self.requested_addrs, and for some reason it never gets cleared. I think the Synchronizer task is working, but it's state wasn't cleared during interface change when taskgroup died
So I found out this issue is not reproducible on python 3.12, so probably it's still some asyncio bug involved when taskgroup dies still triggering similar issues Will test on 3.11 now to see if it works in 3.11 too
It works in 3.11 too. So indeed something similar is involved even after taskgroup has died
To be clear, aiorpcx 0.23.1 is sufficient to fix the reproducer from the OP, right? But you are saying your problem is not sufficiently covered by that reproducer?
aiorpcx is sufficient to fix initial reproducer, yes I think Exception is logged correctly, but when taskgroup dies, some logic elsewhere makes it so that the wallet which was behind the died taskgroup does not recover. All other wallets who just receive stop signal recover just fine
Well, the example in the OP has been fixed, so I am inclined to close this. You are very welcome to open a new issue if you can somewhat detail the remaining bug or ideally create a new reproducer for it.
There is a problem in interface's synchronizer: if we connect to a new server, we first subscribe for notifications using
blockchain.scripthash.subscribe
, but if server is too slow to respond (longer than default timeout of 30 seconds), then aiorpcx'stimeout_after
raises it's exception which inherits fromCancelledError
, which apparently causes async task to be not only not handled, but also cancelled and disappear, which makes synchronizer'sself.requested_addrs
have some requested addresses in it, but never emptied, causingwallet.is_up_to_date()
return False, which will stick until interface is changed (by luck after a few hours or in case of exception which may never be raised)Reproducer:
Similar issue we had before: 12e628e2e6eb39709361b587a87e35f291200d16