haskellari / postgresql-simple

Mid-level client library for accessing PostgreSQL from Haskell
Other
88 stars 46 forks source link

Prevent FdData leaks in getNotification #99

Open velveteer opened 2 years ago

velveteer commented 2 years ago

The rationale for this change is a memory leak observed in a production application using postgresql-simple-0.6.4. The related code is proprietary but I will try to give the gist of the issue. If a minimal repro is requested I could put one together.

When getNotification is blocking in an async-spawned thread, upon cancelling and restarting the thread in a loop (reusing the same connection) a heap profile shows an increasing amount of FdData closures building up (along with a few other closures, e.g. the TVar Nothing created in threadWaitSTM).

Forgive my limited knowledge of the GHC EventManager, but it seems that the registered callback made by threadWaitReadSTM does not get removed from the EventManager state when waitRead (I presume) is interrupted by the AsyncCancelled.

This change somewhat mirrors how the non-STM threadWait handles exceptions, and so I think it should be benign. I've tested this change against our application and confirmed the FdData and related closures are no longer hanging around.

phadej commented 2 years ago

Forgive my limited knowledge of the GHC EventManager, but it seems that the registered callback made by threadWaitReadSTM does not get removed from the EventManager state when waitRead (I presume) is interrupted by the AsyncCancelled.

Have you asked on GHC issue tracker, that sounds like a bug in EventManager. (It's good that we can workaround it, but better to also fix the issue at the source).

phadej commented 2 years ago

Also reproducer would be nice.

If I understand right, just doing getNotication in async and canceling it for a million times should not make RTS use a lot memory, but apparently it will? A standalone program doing that would be enough (i.e. to test with +RTS -s manually).

phadej commented 2 years ago

I also don't understand what we get by using threadWaitReadSTM here, the comments don't make sense to me. (The fd can still change if the connection is renewed in between). I have to look through the history.,

Could you try your application with threadWaitRead branch, i.e. removing the #if block

velveteer commented 2 years ago

I also don't understand what we get by using threadWaitReadSTM here, the comments don't make sense to me. (The fd can still change if the connection is renewed in between). I have to look through the history.,

Could you try your application with threadWaitRead branch, i.e. removing the #if block

Indeed, avoiding threadWaitReadSTM altogether resolves the issue.

Here's a minimal reproduction: https://github.com/velveteer/postgresql-simple-leak-repro