radicle-dev / radicle-link

The second iteration of the Radicle code collaboration protocol.
Other
421 stars 39 forks source link

librad: fix deadlock when waiting for idle #736

Closed geigerzaehler closed 2 years ago

geigerzaehler commented 2 years ago

We fix a deadlock when we wait for the quic endpoint to become idle.

The deadlock happens when we wait for endpoint.wait_idle() while we are simultaneously still trying to connect to a remote that does not respond (i.e. it neither accepts nor refuses the connection) in connect called by librad::net::protocol::io::discovered().

To make endpoint.wait_idle() future finish we need to advance the connection task so we drop it after the future has finished instead of before.

geigerzaehler commented 2 years ago

Is there any way of confirming that this fixes a deadlock?

I’ve prepared a branch thomas/reproduce-deadlock-fix on Upstream. To reproduce the error checkout the second to last commit on that branch and run yarn run test:integration:debug. After cypress opens up click on the networking test to run it. After the test has passed, wait for the log line establishing connection remote_addr=64.190.62.111:123 in the terminal. Then hit Ctrl+R in the test window to re-run the test. The second run fails and in the console you can see that after a GET request to /v1/control/reset we don’t see the log line that tells us that the network protocol has shut down.

If you move to the head of the branch the proxy uses the updated librad version and everything works as expected.