hreinhardt / amqp

Haskell AMQP client library
Other
126 stars 38 forks source link

`closeConnection` can swallow IO exceptions and block / hang indefinitely #103

Open liminalisht opened 2 years ago

liminalisht commented 2 years ago

https://github.com/hreinhardt/amqp/blob/3ddd97222881a2d2af9e2d23bdcfd63f507c0b09/Network/AMQP/Internal.hs#L418-L431

closeConnection swallows any IO exceptions it might encounter upon sending the close connection message to the server, and then waits on the server indefinitely, expecting the server to send back a connection_closed_ok message.

One problem I see is that if an IO exception is encountered when sending the server the close connection message, there is no reason to expect that the server has properly received the message, and therefore no reason to expect that the server will respond with a connection_closed_ok message.

In such a case, the MVar contents will never become full, and the call to readMVar will block forever. From the readMVar docs:

Atomically read the contents of an MVar. If the MVar is currently empty, readMVar will wait until it is full.

https://www.stackage.org/haddock/lts-18.22/base-4.14.3.0/Control-Concurrent-MVar.html#v:readMVar

I would propose that closeConnection at a minimum be altered such that readMVar is not called if an IO exception is encountered.

hreinhardt commented 2 years ago

Is this a theoretical concern or can you actually trigger this as a bug?

I'm asking because I believe the code is correct, based on these two assumptions:

In addition, I think most users will have heartbeats enabled, so that would be another safeguard to prevent infinite blocking.

liminalisht commented 2 years ago

I have in fact experienced closeConnection hanging (and have temporarily gotten around it by invoking it using timeout.)

However, to be clear, I do not know why closeConnection hangs. The explanation I proposed above is indeed (to your point) theoretical. I proposed it because I could not think of another reason for the function hanging.

Given the code you pointed me to, I am now quite perplexed. It looks like the lock should definitely be filled whenever the connection is closed.

That leaves me wondering whether there's really no other way for writeFrame to fail (other than the connection being closed).

hreinhardt commented 2 years ago

After thinking about it some more, you might be right. If writeFrame fails without closing the connection, the code might deadlock. I have no idea if that could ever happen, but it's probably better to be resistant to that.

I've pushed a fix: https://github.com/hreinhardt/amqp/commit/ff169bbb8eec2d62efb4819f56f2c2d679ff4fc5

Does the change look good to you?

liminalisht commented 2 years ago

Yeah, looks good to me. Thanks for the quick response / work!

hreinhardt commented 2 years ago

Ok, I've pushed a new version to Hackage. Curious to see if the blocking issue persists for you.