weiznich / diesel_async

Diesel async connection implementation
Apache License 2.0
656 stars 81 forks source link

Connection from pool is reused while transaction is still open #198

Open lsunsi opened 5 days ago

lsunsi commented 5 days ago

Setup

The setup can be found on this repo. https://github.com/lsunsi/diesel-async-dirty-conn-bug

Versions

Feature Flags

Problem Description

The future generated for a transaction is not cancel safe, as it's documented.

That said, depending of how many times you poll it, the behavior changes and one particular behavior seems to be a bug to me because it's highly dangerous.

In general, what happens is that the connection used on the dropped future is closed as per documentation. But sometimes it gets returned to poll and reuses, because it passes the is_broken test and select 1 test while having the transaction open.

This happens, from what I understand, because you can cancel the future just between the transaction start and it registering that the transaction started. (Ref. src/transaction_manager.rs:280)

What are you trying to accomplish?

Having a clear non-corrupt behavior for dropping futures from transactions (I found this while using diesel-async transactions on tonic/axum handlers which can be canceled at any time).

What is the expected output?

I think the intended behavior here is that the future being dropped causes connection drop (and implied rollback) cleanly. Which means cancelling futures of this kind would lead to bad performance of the pool, but not serious corruption of reusing connections with open transactions on it.

What is the actual output?

Sometimes (less than 1% of times) I get a scary DatabaseError(Unknown, "SET TRANSACTION [NOT] DEFERRABLE must be called before any query") (reproduced on the repo).

Are you seeing any additional errors?

Yes, tokio_postgres prints WARNING: there is already a transaction in progress as a log too.

Steps to reproduce

The setup can be found on this repo. https://github.com/lsunsi/diesel-async-dirty-conn-bug RUST_LOG=debug DATABASE_URL=... cargo run

Checklist

weiznich commented 5 days ago

Thanks for filling this bug report and providing and example that triggers this behaviour. Now the bad news is that I fear we won't be able to fundamentally solve this problem, due to limiteds how async rust works. There is just too much that can go wrong around cancellation. After all that's the reason why this crate still is in the 0.x version range. That also affects any other async rust database crate. I have written a longer blog post on that topic here

What is likely possible ties to review the sequence of emitting transaction related sql commands and the points where we increase/decrease the internal transaction depth counter here: https://github.com/weiznich/diesel_async/blob/main/src/transaction_manager.rs#L297

For fixing this particular problem it might be meaningful to first increase the counter and after that execute, although that might need special handling for the cases where the BEGIN command fails.

lsunsi commented 4 days ago

Hey @weiznich , thanks for your answer. I'll answer it going deeper to test how well I understand the issue, so feel free to correct me because I'm not as versed in the Rust shenanigans as you. Also I've read your post a couple times and I recommended it internally in my company a couple times too.

So, firstly, yeah, that's what I thought and I came as far as to relate to your post. From my perspective, the fundamental issue is that when the future is dropped the connection has to be closed and recycled (not tried to be rolled back or fixed in any way). The reason I filed this bug is because I did expect a lot of connection closing and re-opening but not the behavior I reproduced.

My intention would be to just close this gap, so I'd like to all future cancellations to yield the same behavior that can't be avoided and not for some cancellations to be more dangerous than others. I think current Rust might be sufficient to achieve this behavior, am I wrong in this assumption? You mentioned the "marking of transaction depth" could be flipped (done before attempted). In my mind, it'd be even better if we could have a boolean flag to mark "critical paths" of the procedure that would allow "interruptions during critical path" to be detected in the "is_broken" function. In my mind, any critical path that gets interrupted yield in broken connection and it'd fix this (and potentially some other issues).

In this way we could phrase that "Rust is not there yet to allow for perfect connection handling" but the cost associated with this shortcoming is not corruption but just maybe a needless connection closing and re-opening.

Does that track for your? If you think this line of thought makes sense, would you consider a PR in that direction?

Ok, moving past that, second topic. For my production code using diesel and diesel-async, I'm proposing a workaround with my team using the "CustomQuery" you already proposed using in other issues. The idea is that the underlying database is able to know if the critical path got interrupted, so this could even be an alternative way of achieving the same result proposed above. I added this fix to a different branch on the reproduction repository, would you mind taking a look? Do you see any downsides with this approach? From my testing this fixes the issue because the query considers the connection as broken, as it should.

weiznich commented 4 days ago

Using a boolean to detect such cancellation sounds like a good idea. I've put together a PR here: https://github.com/weiznich/diesel_async/pull/201

It would be great if you could test this with your reproduction, as I'm ran out of time for that.

lsunsi commented 3 days ago

Added a comment to the PR but yeah this is to the point what I had in mind, even better even!