Closed grosch closed 4 years ago
Nesting a transaction should work as you have it there. Which db driver are you using?
Fluent
Sigh. And by Fluent I mean PostgreSQL
This is all it takes to cause deadlock in the app
func badMonkey(req: Request) throws -> EventLoopFuture<HTTPStatus> {
req.db.transaction { db in
Point.query(on: req.db).all().transform(to: .noContent)
}
}
Notice how I used req.db
inside the transaction.
Hmm that's different from the original example. It makes sense that would not work since you should only use the db supplied to the closure while in a transaction. Try refactoring like this:
func badMonkey(req: Request) throws -> EventLoopFuture<HTTPStatus> {
let points = Point.query(on: req.db).all()
return req.db.transaction { db in
points...
}.transform(to: .noContent)
}
Or this:
func badMonkey(req: Request) throws -> EventLoopFuture<HTTPStatus> {
req.db.transaction { db in
Point.query(on: db).all().transform(to: .noContent)
}
}
To clarify, for this to work you need two connections:
req.db.transaction { db in
Point.query(on: req.db).all().transform(to: .noContent)
}
One in a transaction and one normal connection. These must be separate connections since once a connection is in a transaction all queries on that connection will be a part of the transaction.
This is deadlocking immediately since FluentPostgresDriver currently defaults to 1 connection max per pool: https://github.com/vapor/fluent-postgres-driver/blob/master/Sources/FluentPostgresDriver/FluentPostgresConfiguration.swift#L22
Increasing that number should stop this from always deadlocking, but I'm not sure you can avoid deadlocks here. Say you up the number to 2 max connections per event loop. Two requests come in at the same time to badMonkey
and start the transaction. Now these two requests have consumed two connections and will be deadlocked waiting for another connection. To avoid deadlocks you must never have a pooled resource require requesting another pooled resource in order to release.
Fluent takes care of this if you use db
instead of req.db
while inside the transaction. Ideally we could provide some sort of error if we detect a deadlock is possible, but I'm not sure how that would be implemented.
Ideally we could provide some sort of error if we detect a deadlock is possible, but I'm not sure how that would be implemented.
Exactly. If you have some way to flag it happened, awesome, otherwise close this. I know I'm supposed to always use the db
provided by the transaction but we run into trouble when updating a large method where we realize, "Oh, I need a transaction here" and so we wrap it, change most req.db
to db
but miss one.
I'll leave this open as a reminder to try and find a way to detect deadlocks.
Why would this cause deadlocks? My understanding from @tanner0101 's explanations was that each query should get a new connection. So this shouldn't cause problems as long as the pool has at least 2 connections.
Also, shouldn't the connection from the first query have been returned to the pool once the ode inside the flatMap
block is run?
On 8. May 2020, at 18:21, grosch notifications@github.com wrote:
Something like this will hang the client, and in db log we get
LOG: unexpected EOF on client connection with an open transaction
Client.find(id, on: req.db).flatMap { /// Validate we want to make updates req.db.transaction { db in // Do stuff I don't really want to create a transaction if I don't have to. In the case where I discovered this I'm doing multiple queries against multiple tables, and only under certain conditions do I then create a transaction and update multiple different tables.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.
You can create a deadlock by doing something like:
let pool = EventLoopConnectionPool(..., maxConnections: 1)
pool.withConnection { a in
pool.withConnection { b in
pool.eventLoop.makeSuccededFuture()
}
}
Since the pool only has one connection and we're waiting for another connection from the pool before we can return the first connection, this code can never finish.
Fluent queries, when run on req.db
, do pool.withConnection
internally. So nesting req.db
usage inside a transaction closure (which is inside a pool.withConnection
closure) is similar to the above example and will require at least 2 connections.
Since the pool only has one connection and we're waiting for another connection from the pool before we can return the first connection, this code can never finish.
Would that also apply if the pool has 2 or more connections? Wouldn't the second call to pool.withConnection
just request a second connection from the pool in that case? (Personally, I run most of my pools with ~5 connections and drop "stale" connections from time to time when they have not been used for a while.)
Yes, if the pool had a max of two connections, then that code would not be a problem.
To generalize, if you have a pool with n
max connections and you request n + 1
connections from the pool, you will deadlock. In my example n = 1
(maxConnections: 1
) and the code requires 2
connections from the pool at once to complete. Thus deadlock.
If you have n = 2
(maxConnections: 2
), then you could create a deadlock by doing:
let pool = EventLoopConnectionPool(..., maxConnections: 2)
pool.withConnection { first in
pool.withConnection { second in
pool.withConnection { third in
// we have three connections at the same time
// this is impossible since maxConnections = 2
print(first, second, third)
...
}
}
}
Right, that is the way I would have expected it to work. As I said, my personal "solution" is to just make sure my connection pool is large enough...
Yup that's a good solution. I do think it would be helpful if the connection pool could detect deadlocks to assist in debugging issues around a connection pool being too small. But I'm not sure that will be possible to implement.
I do think it would be helpful if the connection pool could detect deadlocks to assist in debugging issues around a connection pool being too small. But I'm not sure that will be possible to implement.
Agreed, to both the usefulness and (sadly) the feasibility.
From taking a quick peek at the code, wouldn't simply keeping count of available connections - requested connections
do the trick? That way if that number becomes 0 and yet another connection is requested, you can fail the future, correct?
Though I might be oversimplifying it here :)
From taking a quick peek at the code, wouldn't simply keeping count of
available connections - requested connections
do the trick? That way if that number becomes 0 and yet another connection is requested, you can fail the future, correct?
Yes, but then any requests received while all e.g. 5 connections are momentarily in use would simply fail. So we would start to randomly drop connections any time all connections are "taken" rather than simply wait until they become available again.
However, I'd say that having a timeout for this waiting could be a good idea. That way, "deadlocked" requests could e.g. be released after, say, 10 seconds. Not ideal, but better than locking indefinitely, I guess.
Yes, but then any requests received while all e.g. 5 connections are momentarily in use would simply fail.
Very good point, totally overlooked that. In that case a timeout does sound better than deadlocking. This would at least free up the connections after x amount of time, instead of at some point deadlocking everything because there are no more connections that can be freed.
I feel it should not be possible to deadlock Vapor apps if me or my developers accidentally use rq.db
rather than db
, a common and easy to make mistake; usually we are doing rq.db
.
I wonder if it’s possible that once a transaction is started the value of db
on rq
could be changed by Vapor.
I understand some mitigation has been done in #67, but I'd rather see Vapor make design choices that prevent these kinds of scenarios in the first place. To err is human.
It's a difficult balancing act. Vapor should try to prevent common errors but it should also not get in the way of the developer. If we silently swap out req.db
for example, that would prevent you from being able to make any requests outside of the transaction even if you wanted to. I can imagine a few cases where that would be useful.
At least #67 has prevented applications from hanging forever if deadlocks are created. But I agree we can do better.
A few ideas:
"leaky" pooling
I've opened an issue for this here https://github.com/vapor/async-kit/issues/72. This allows the connection pool to temporarily open new connections beyond its maximum if needed.
detect deadlocks
Maybe there is some way for withConnection
closures to share context and detect / prevent impending deadlocks. I'm less sure this is possible, but it's worth looking into.
researching other frameworks
How have other web frameworks dealt with this? Node.js / Java would probably be good places to look.
I think with the async nature of our connection pool this is going to be a tricky one, mostly because requesting n + 1
connection doesn't always mean a deadlock. It could also mean you'd get the n + 1
th connection once the 1st is done.
Did a quick peek into Python's Peewee ORM, and they either block for a certain time, indefinitely, or raise an error when requesting n + 1
connections.
I don't think always raising when requesting n + 1
conns are requested is a good idea (see above). The timeout measures are something we have in place too (Except the option to block indefinite, though you could set the timeout to 24 hours or something stupid large like that).
Also looked at Node's Sequelize ORM, and they accept timeouts for acquiring a new connection, a min and max.
For Java I found the Hibernate ORM and the following documentation on connection pooling.
The documentation mentions two options here. C3PO provides a min and max size, as well as a timeout for getting a connection. Proxool accepts a max and minimum connection count.
Overall seems none of the ORMs fully prevent deadlocks, other than timeouts when you are in a deadlock and I think this is the approach we should follow too
I don’t agree that this is an acceptable compromise, but this is not my project, you have your own list of priorities that you adhere to.
In our app I wrote a wrapper that prevents this from happening and am ok with that solution.
@mxcl is the wrapper in your app following a similar approach to replacing req.db
? Also interested for your thoughts on #72 and if that seems like something that would benefit you.
We’re using https://github.com/candor/sublimate which is the “synchronous” solution I was talking about ~6 months ago. Inside of which the db
is swapped out once a transaction begins.
Overall I aim for reliability above all, so this is why the set of trade offs for me end up there, but I certainly understand every library and framework has different base policies that effect everything above that. We’re all experienced in releasing such tooling here and I have much respect for this project.
Something like this will hang the client, and in db log we get
I don't really want to create a transaction if I don't have to. In the case where I discovered this I'm doing multiple queries against multiple tables, and only under certain conditions do I then create a transaction and update multiple different tables.