haskellari / postgresql-simple

Mid-level client library for accessing PostgreSQL from Haskell
Other
88 stars 46 forks source link

libpq: another command is already in progress #69

Open codygman opened 3 years ago

codygman commented 3 years ago

My team and others in the comments are still affected by this issue from lpsmith/postgresql-simple#177, so I'm copying it over here.

I hope to dig into the issue some more today and post back here but we'll see what happens.

codygman commented 3 years ago

Shouldn't queries that receive an asynchronous exception do something like call Database.PostgreSQL.LibPQ.cancel on the connection?

Also it looks like hasql has a regression test for this issue.

codygman commented 3 years ago

The minimal repro from the comments returns timeout as expected now and seems to be fixed with versions:

j$ stack exec -- ghc-pkg list | grep postgresql
    HDBC-postgresql-2.5.0.0
    persistent-postgresql-2.11.0.1
    postgresql-libpq-0.9.4.3
    postgresql-simple-0.6.4

The minimal repro was:

{-# LANGUAGE OverloadedStrings, ScopedTypeVariables #-}
import Database.PostgreSQL.Simple
import System.Timeout

main :: IO ()
main = do
  conn <- connectPostgreSQL "dbname='foo' user='bar' password='xxxx'"
  res <- timeout 1000000 $ withTransaction conn (query_ conn "select pg_sleep(5)")
  case res of
    Just [Only x] -> putStrLn "Done." >> return x
    Just _ -> error "Impossible."
    Nothing -> putStrLn "Timed out."

I'll post a minimal repro of my problem that still exists which is connections being returned to Data.Pool while a transaction has not been committed or rolled back as expected.

codygman commented 3 years ago

Actually I'm not able to reproduce my issue either with just postgresql-simple:

      let connectionInfo :: ConnectInfo =
            defaultConnectInfo
            { .. }
      pool <- createPool (connect connectionInfo) close 1 10 1
      threadId <- liftIO . Concurrent.forkIO $ withResource pool $ \conn -> do
        _ :: [Only ()] <- query_ conn "select pg_sleep(20)"
        pure ()

      void $ Concurrent.killThread threadId
      withResource pool $ \conn -> do
        _ :: [Only ()] <- query_ conn "select pg_sleep(2)"
        pure ()

For now I think this rules out postgresql-simple as the culprit for my bug, but perhaps that's not similar enough to my Persistent streaming code that exhibits this issue.

TODO post stripped down persistent code

Given that I'll create an issue upstream there but I'll leave this open in case others who commented experience this issue using postgresql-simple directly.

Here is the related issue on the Persistent side for SqlBackend having issues that also seems related: yesodweb/persistent#981

mzabani commented 3 years ago

Isn't it possible that in your sample above killThread kills the thread before the query gets sent to the server? We have this problem at the company I work for as well. I investigated this a bit and pushed a PR that might fix it :crossed_fingers:

brandon-leapyear commented 2 years ago

:sparkles: This is an old work account. Please reference @brandonchinn178 for all future communication :sparkles:


Are there any updates here? I'm still getting this issue. Adapting the initial repro gets me the same error:

{-# LANGUAGE OverloadedStrings, TypeApplications #-}
import Database.PostgreSQL.Simple
import System.Timeout

main :: IO ()
main = do
  conn <- connectPostgreSQL "..."
  res <- timeout 1000000 $ withTransaction conn (query_ conn "select pg_sleep(5)")
  case res of
    Just [Only ()] -> putStrLn "Done."
    Just _ -> error "Impossible."
    Nothing -> putStrLn "Timed out."
  withTransaction conn (query_ conn "select 1") >>= print @[Only Int]
Timed out.
Test.hs: libpq: failed (another command is already in progress
)

I just tried #71, and this no longer errors. Maybe we can merge it sometime soon? @phadej @mzabani

mzabani commented 2 years ago

It's been a while, but as it is I think we shouldn't merge #71. IIUC it is not possible to reliably fix connection state for new queries to run when an exception interrupts a query, in general.

My last comment in the PR suggests merging just the cancelAndClear function, which could then be used judiciously by users trying to reuse a connection whose query was interrupted. But I believe this alone wouldn't allow users to fix at least some cases of "libpq: another command is already in progress" errors we see around, since for example transactions in postgresql-simple run queries in exception handlers . For that, we'd have to do what @Yuras suggested in this comment and "fix existing code not to reuse connections on failure". That looks like just a one line removal fix, but it is not backwards compatible if users were previously relying on rolled back transactions on exceptions, so it may be more complicated than that.

I don't know the best way forward. Merging cancelAndClear independently seems like a possibility, and maybe then adding new withTransaction... functions that don't rollback on exceptions, document them and deprecate the old ones?

brandon-leapyear commented 2 years ago

:sparkles: This is an old work account. Please reference @brandonchinn178 for all future communication :sparkles:


ah wait never mind. We use persistent-postgresql, and it looks like persistent uses its own transaction logic, so postgresql-simple's issues with transactions shouldn't be causing our error. I'll post a separate issue with persistent-postgresql. Thanks!

codygman commented 2 years ago

For that, we'd have to do what @Yuras suggested https://github.com/haskellari/postgresql-simple/pull/71#issuecomment-813049670 and "fix existing code not to reuse connections on failure". That looks like just a one line removal fix, but it is not backwards compatible if users were previously relying on rolled back transactions on exceptions, so it may be more complicated than that.

So users that don't use pools but depend on transaction rollback?

Merging cancelAndClear independently seems like a possibility, and maybe then adding new withTransaction... functions that don't rollback on exceptions, document them and deprecate the old ones?

Please do this. I'd like my team to be able to avoid maintain a fork with that PR.

Then hopefully the deprecation warning will solicit feedback from affected users. Maybe linking to the create new issue page in the deprecation warning will increase chances of feedback.

mzabani commented 2 years ago

I thought a bit about this and I think there is another solution to this problem.

We could add a new IORef or MVar field to Connection called something like connectionMayHaveRunningStatement :: IORef Bool. We would then catch exceptions with an uninterruptible mask when running queries and set that to True, but do nothing else in the exception handling block.

Then, before running any new queries with the same connection, we would cancel any running statements iff that value is set to True. We can be careful and give out good error messages in exceptions if cancelling the running statement fails (e.g. showing what query we would run next) so that users understand the context a bit better.

This may even enable us to use concurrency and timeout more freely, unless I'm missing something (definitely not my area of expertise here).

Existing withTransaction.. functions would remain as they are, and we might still want to deprecate and replace them, but they'll be far less likely to run into "libpq: another command is already in progress" when running ROLLBACK or if they fail they'll do so with a much more informative error. In any case, we'll be able to think about them separately from this discussion.

I can implement this (in fact I hacked it quickly and it passes the tests I created), but I'd like to hear from others if this sounds reasonable first. @Yuras @phadej