Non-Postgres Exceptions return connection to pool in invalid state

ChrisPenner commented 2 months ago

Hello 👋🏼 , thanks for the library!

I noticed that some transactions were starting with the error: WARNING: there is already a transaction in progress,

Taking a look at Pool.use it appears that connections are reset if a QueryError or PipelineError occurs, but other types of exceptions cause the connection to be returned to the pool WITHOUT being reset. This means that in the case that if the exception was triggered in Haskell code or via an Async Exception (e.g. a timeout) a transaction may remain in-progress on that connection, which is still being returned to the connection pool. https://github.com/nikita-volkov/hasql-pool/blob/master/src/library/exposed/Hasql/Pool.hs#L211-L213

Although it seems to me like a strange default to return connections to the pool on arbitrary exceptions, I would be happy to work around this by providing my own rollback code in these cases, however the interface of the library doesn't seem to provide any way for me to manually roll-back transactions when an exception occurs before the connection is returned to the pool, I can't perform my own exception handling because Session isn't MonadUnliftIO, and if I wrap Pool.use itself I can't get hold of the Connection that was used in order to roll it back.

It looks like hasql-pool should probably either provide an extension point where I can rollback connections on exceptions (including async exceptions), or should not return connections to the pool if they experience an exception. Also, this handling should be carefully masked to ensure a new async-exception doesn't interrupt the error handling. Please let me know if there's anything I can do to help get this resolved 😄

ChrisPenner commented 2 months ago

I investigated a little more, both synchronous and asynchronous exceptions trigger this branch, and their connections are being returned to the pool in an invalid state.

The best I can see that I'd be able to do from the library interface is wrap my Pool.use in an onException that calls Pool.release, but that seems less than ideal both because it would release ALL connections, but also would result in potential race conditions between when an exception occurs and the pool is signalled to release.

ChrisPenner commented 2 months ago

Here's the changeset I ended up rolling with in the meantime, I understand that presumably the default is the way it is for a reason and so this may not be the approach you desire, I'd prefer not to maintain our own fork if there's a resolution to this issue you prefer please let me know so I can help contribute back 😄

https://github.com/nikita-volkov/hasql-pool/compare/master...ChrisPenner:hasql-pool:reset-connections-on-exception?expand=1

nikita-volkov commented 2 months ago

Thanks for the report!

The exceptions are handled this way because they are only expected to appear in the user's code, which runs inside Session, and that code should not know anything about the status of the connection. The library aims to keep the control of the connection-management entirely to itself. Letting the user control it via exceptions would violate that purpose. Also if we were to provide the user with control I would much rather prefer providing it explicitly than via exceptions.

I am open to changing this design if there are real use-cases which require that and cannot be resolved otherwise. However at the moment I am not yet convinced that we're dealing with such a case in this issue.

So it all boils down to the question of why you think that the connection should be reset in this scenario. Could you describe your use-case in more detail?

ChrisPenner commented 2 months ago

Thanks for the prompt response! I totally understand the desire to keep a self-contained interface.

I'm definitely open to resolving this issue in some other way if we can find one, however I haven't found a way to do so with the interface as it currently stands; but would love to be proven wrong 😄

Here's a breakdown of one such issue I'm not sure how to handle:

Our app uses Pool.use within web requests on a web app to access the DB
We have timeouts on our requests to prevent long-running requests from taking up too many resources
Sometimes a timeout triggers while a Pool.use is running a session, sometimes in the middle of a Transaction
This timeout triggers an Asynchronous exception which interrupts the process mid-transaction
This connection is now in a very strange mid-transaction state when returned back to the pool, and can result in some pretty unexpected bugs

Does that make sense, or is there some way I can handle this without changes to the library?

I'd love to be able to detect these exceptions so I can clean up the connection within application code, but Session is neither the IO monad, nor does it implement MonadUnliftIO, so I have no ability to detect or catch exceptions which are triggered during execution using something like onException or try or finally.

I can wrap the outer Pool.use with exception handling, however at that location we've already exited the Pool.use and the connection has been returned to the pool and I have no access to it any more, so I have no ability to clean things up myself.

Some changes I can imagine which would address this (with varying feasibility):

Add a user-configurable exception handler to Config, something like SomeException -> Session ()
Implement MonadUnliftIO on Session such that the user can add their own exception handlers within the monad itself
Provide some form of exception handling specialized to Session itself (e.g. a form of withException specialized to Session would be fine)

nikita-volkov commented 2 months ago

So as I understand the problem is that the connection gets returned to the pool in a non-idle TransactionStatus.

Looks like if in Hasql inside Session.run we were to add an interceptor of exceptions, which would check transactionStatus and attempt to execute ABORT if it's non-idle, that would solve the problem.

Does that look like a reasonable solution to you?

nikita-volkov commented 2 months ago

As an alternative to ABORT the reset libpq procedure can be called.

ChrisPenner commented 1 month ago

Thanks for looking into it, yes that would solve my issue wonderfully (as long as it happened on both async and sync exceptions) 😄 . Either reset or ABORT would be fine.

nikita-volkov commented 1 month ago

Released in hasql-1.8.1.

Please do post back on whether it has solved the issue.

ChrisPenner commented 1 month ago

Thanks @nikita-volkov! I tried it out, unfortunately when re-using a connection which has been reset I get hit with a flurry of:

ERROR:  prepared statement "0" does not exist

From a glance at the code-change it looks like you're calling the PQ.reset, but probably just forgot to reset the hasql Connection's prepared-statement registry, so I'm guessing the connection state associated with those prepared statements is being reset but hasql still tries to use the (invalid) prepared statements.

One other thing I noticed is that exceptions aren't being masked within the exception handler, so it's possible another async exception could come in and interrupt the existing exception handler and leave things in a bad state.

I'd recommend something like bracketOnError which does the proper masking, or any of the combinators from unliftio which have much saner defaults for this sort of thing 😄

I won't be able to use the current release due to the issue with prepared statements unfortunately.

nikita-volkov commented 1 month ago

Right! Forgot to reset the statement registry and to mask the exception handler. The fix should be released once the pipeline finishes.

As before please post back on how it works.

Unfortunately I don't have enough spare time to cover these changes with tests, so PRs are welcome!

ChrisPenner commented 1 month ago

Awesome, the new version seems to be working for all my tests at least 👍🏼, I'll switch to it in production once I'm back from holidays and let you know if anything else comes up.

Thanks for the quick responses!

nikita-volkov / hasql-pool

Non-Postgres Exceptions return connection to pool in invalid state #46