lpsmith / postgresql-simple

Mid-level client library for accessing PostgreSQL from Haskell
Other
206 stars 71 forks source link

Atomic `withTransaction`? #202

Open charles-cooper opened 7 years ago

charles-cooper commented 7 years ago

Is there a recommended way to perform an atomic withTransaction block? The following deadlocks:

withMVar (connectionHandle conn) $ \_ -> do
  withTransaction conn -- deadlocks since `begin` needs to acquire the lock
  ...

The only other things I can think of are using an extra MVar to synchronize on the Connection, or somehow abusing unsafeIOToSTM .. both of which seem prone to error.

lpsmith commented 7 years ago

Well, not really in vanilla postgresql-simple. Anything we could provide would amount to using an extra MVar, which I have mixed feelings about. Then you have the additional complication that you'd want individual statement/queries to take the transaction lock if they aren't inside a transaction. Which is probably doable, but is more complicated than you might think. My current opinion is that these concerns are best left to a higher-level wrapper to deal with individual connection-level concurrency for a variety of reasons. (e.g. one of my projects manages to get away without needing to atomically issue multiple statements on multiple round trips...)

snaplet-postgresql-simple does what you are asking for, which integrates with the connection pool it also provides.

saurabhnanda commented 7 years ago

I just bumped into this issue while looking for something else and am a little confused. Isn't the atomicity of transaction blocks guaranteed by PG itself?

charles-cooper commented 7 years ago

@saurabhnanda the atomicity of transaction blocks is guaranteed. postgresql-simple only guarantees that if exceptions are thrown in a withTransaction block that the transaction will be rolled back.

However, multiple threads can be writing queries to one connection, and as far as PostgreSQL is concerned those are all the same execution context. For instance Thread 1:

withTransaction conn $ do
  execute_ conn [sql|INSERT INTO table SELECT 1;|]
  threadDelay 1000000

Thread 2:

rollback conn -- Rolls back SELECT 1 despite being in a different thread of execution.

Thread 3:

threadDelay 500000
query_ conn [sql|SELECT * FROM table|]

What the programmer meant:

BEGIN;
INSERT INTO table SELECT 1;
-- delay 1 second
COMMIT;
ROLLBACK;
-- delay 500ms
SELECT * FROM table;

What PostgreSQL sees:

BEGIN;
INSERT INTO table SELECT 1;
ROLLBACK;
-- delay 500ms
SELECT * FROM table;
-- delay another 500ms
COMMIT;
robinvd commented 6 years ago

@charles-cooper so we have to make a new connection for every thread?

charles-cooper commented 6 years ago

I ended up wrapping the Connection in an MVar. Another option is to use withResource from https://hackage.haskell.org/package/resource-pool-0.2.3.2/docs/Data-Pool.html.

lpsmith commented 6 years ago

To add to Charles's answer to Saurabh's question, postgresql ensures concurrency safety (for varying values of "safe") per transaction. postgresql-simple cannot affect that. However, postgresql-simple itself isn't entirely thread safe when using withTransaction, basically, intervening concurrent queries can be inserted into the transaction when they shouldn't in almost all (all?) cases. So postgresql will see a transaction as something other than what was probably intended.

To add to Charles's answer to Robin's question, postgresql-simple is thread safe in a looser sense; if you issue queries from multiple threads and the connection is busy, each query will get into line and wait until all the queries ahead of it have finished. This doesn't extend to withTransaction, though, as basically the withTransaction wants to issue multiple queries without going back to the end of the line after each query completes, but postgresql-simple doesn't support this at the present time. This probably could be solved with a more elaborate type of locking mechanism, which I might be amenable to, but there are a couple of slightly tricky aspects to this approach.

charles-cooper commented 6 years ago

@lpsmith I more or less agree with you that supporting a higher level of thread safety on withTransaction is complicated. Not least of which, after implementing a locking mechanism, the user experience might be equally jarring to find that running withTransaction blocks all their other operations on the connection.

Therefore it seems like good design to delegate this to higher-level libraries.

That being said, I do think the documentation should clarify the exact behavior of withTransaction as it currently stands, because the documentation underspecifies withTransaction's thread safety. Because of postgresql-simple's high importance and impact to the community - and this is extra credit - I think it would be better still if postgresql-simple additionally published 'official guidance' regarding how to handle thread safety (one connection per thread and offload transactionality to PostgreSQL, use a resource pool, or something else?).

In any case, if it seems appropriate I can submit a PR for the documentation piece and that should resolve this issue.

lpsmith commented 6 years ago

Yeah, a documentation improvement should be fine.

I actually had a thought that occurred to me shortly after writing my previous response, which I should write down now before I forget it. I think I now know how to implement a suitable locking mechanism in a way I'm satisfied with.

Basically, we change the type of withTransaction from :: Connection -> IO a -> IO a to :: Connection -> (Connection -> IO a) -> IO a.

The way you'd use it is like

withTransaction c $ \c' -> do
    query c' ...
    execute c' ...
    ...

Conceptually, withTransaction would take the MVar lock, and then (functionally!) replace the MVar in c, with a fresh MVar, creating c'. Attempting to use c while the transaction is in progress would block; attempting to use c' would have the same semantics as postgresql-simple currently has. Personally, I'm not opposed to name shadowing, and pretty strongly disagree with people who are strongly opposed to name shadowing; I think it could be pretty idiomatic to write withTransaction c $ \c -> .... Upon exit, withTransaction would mark c' as "transaction finished", which would conceptually be very similar to a "closed" connection, and then release the original MVar lock so that c becomes usable again.

The nice thing about this is you could even use c' from multiple threads, in the odd case that might actually make sense. And by tweaking the conceptual implementation a bit more, you could also solve some of the issues brought up in #179, namely, the possibility having withTransaction (and possibly several variants) deal with "nested" transactions more gracefully, either by throwing an exception, ignoring the nested transaction, or using savepoints to emulate nested transactions. It might be useful to do some isolation level checking in some of these variants, as well.

charles-cooper commented 6 years ago

Hmm. I quite like name shadowing but in the case that somebody chooses not to name shadow, doesn't that open up a not-very-subtle possibility of deadlocking? E.g.

withTransaction c $ \c' -> do
  query c' ...
  execute c ... -- whoops! darn it i hate when i leave my 0x27s at home ...

So I wonder if there is a neat way of using a phantom type / changing the execution context of withTransaction to reflect that, e.g.

withTransaction :: Connection Unlocked -> (Connection LockedDownForBusiness -> IO a) -> IO a

and then requiring two versions of every query/execute function -

-- (I'm sure the same effect could be achieved with typeclasses) 
queryUnlocked :: Connection Unlocked -> q -> IO stuff
queryLocked :: Connection LockedDownForBusiness -> q -> IO stuff

Although now that I think through my suggestion I'm not really sure what it buys over the 'dumb' way either calling withTransaction (\conn -> query conn...) or withConnectionAtomically (\conn -> query conn ...) where withConnectionAtomically has the same concurrency behavior as withTransaction it just doesn't create a transaction PG-side. If a user really wanted to use the connection from multiple threads, variants like queryButItsNotThreadSafe could be provided which use tryReadMVar/tryTakeMVar under the hood.

lpsmith commented 6 years ago

Yep, you are absolutely correct that this sort of mistake would pretty easily lead to deadlock, and I suspect that (without the use of name shadowing) this would be a fairly common sort of mistake.

(Personally, I try to use name shadowing intentionally, and somewhat idiomatically, when it makes sense. I certainly am opposed to the undisciplined use of name shadowing. Using it to document/ensure that a given variable should not be referred to in a given block is an example of when I use name shadowing, which this is a good example of when I'd shadow. But I digress.)

On the other hand, is this (somewhat advanced) issue really worth polluting a (much more basic) interface, and imposing intellectual overhead upon more novice users? I mean, everything else being equal, absolute safety (in the Rust sense of safety) is certainly preferable to provisional safety (i.e. "safety" in languages other than Rust, and to a lesser extent languages other than Haskell, ML, Scheme, Lisp, etc.) However, achieving absolute safety can be difficult and "expensive", whereas provisional safety can at times be a reasonable fallback.

(As another digression, this is very similar to Haskell's let x = x + 1 let-scoping issue; I made this sort of mistake fairly often during my first 5-10 years of Haskell programming, and it was often hard to spot, whereas these days I rarely make this sort of mistake, and if I do, it's relatively easy to spot.)

I guess in the end, I can agree that the current semantics of withTransaction needs to be fixed, whereas I don't know if this newer problem my proposed fix would introduce is really worth fretting over at the present time.

saurabhnanda commented 6 years ago

Generally speaking, sharing the same Connection across multiple threads should not the recommended approach. The recommended approach should be to use a connection-pool with each thread getting a new connection.

For more advanced use-cases, connection-sharing, across threads can be added in a cookbook, of sorts.

The semantics of withTransaction, and withConnection should be clarified in the docs, along with an example of using connection pooling.

lpsmith commented 6 years ago

Generally speaking, sharing the same Connection across multiple threads should not the recommended approach. The recommended approach should be to use a connection-pool with each thread getting a new connection.

That statement isn't overly self-consistent. In that case, you are sharing connections across threads, you are just using the connection pool to mediate the concurrency issues. And in the common sort of use case of a website back end, it's often neither necessary nor desirable to create a new connection for each thread (which is typically created to serve one request), or to reserve a single connection for the duration of the entire request.

(e.g. see withPG and withTransaction in snaplet-postgresql-simple; otherwise every database action in a single thread, can potentially go out on a different connection, and single connections can bounce around between threads.)

Now, while I agree that using a single connection from multiple threads without any other sort of concurrency mediation is a bit unusual, but it is occasionally useful, especially if you aren't dealing with something that isn't (directly) an http backend. And the postgresql-simple philosophy is to generally avoid imposing opinions, but rather allow the widest possible range of use cases.

saurabhnanda commented 6 years ago

That statement isn't overly self-consistent

Allow me to rephrase. Sharing a connection concurrently across threads should not be recommended. If sharing connections across threads in a non-concurrent manner is required (most common use-case), then connection pooling is a tried and tested solution, and it should be recommended.

... but it is occasionally useful... ... philosophy is to generally avoid imposing opinions...

"make simple things simple, complex things possible"

reite commented 6 years ago

I ran into a issue probably related to this. I'm using postgres-simple together with beam-postgres in a http server. When serving concurrent requests I was some times getting errors that looked like this (with different temp numbers):

SqlError {sqlState = "34000", sqlExecStatus = FatalError, sqlErrorMsg = "cursor \"temp20\" does not exist", sqlErrorDetail = "", sqlErrorHint = ""}

As a novice in this space it was not obvious to me that I couldn't just create a connection when starting the http server and share it between requests. Creating a new connection for each request solved the issue.

I think a warning should be added that it is not safe to use a connection concurrently.

lpsmith commented 6 years ago

But it is safe to use a connection concurrently; there just isn't any provision at the present time in cases when you want to issue multiple queries atomically (as is involved with things such as e.g. fold and withTransaction) without going back to the end of the line.

In your case, you probably want to investigate connection pooling, either via pgbouncer (which would change the deployment of your code, but not the code itself) or tweak your code to add pooling internally to your application.