yoshidan / google-cloud-rust

Google Cloud Client Libraries for Rust.
MIT License
222 stars 80 forks source link

Session handling leaks sessions #164

Closed ivankelly closed 1 year ago

ivankelly commented 1 year ago

I came across this issue running a heavy load against spanner, with multiple tasks using the same client. The root cause is a race in the timeout passing a session to a waiter via a oneshot.

In acquire, it creates a oneshot that which it adds to the waiters and then receives from the oneshot under a timeout. When a session becomes free it is sent via the oneshot, and acquire gets a session.

The problem is if there's a race with the timeout. The timeout can complete, and that same instant a session becomes free and is placed in the oneshot. Since the timeout completed, nothing ever receives the session, so it is leaked. Most importantly, the in_use is never updated, so the session appears to be still in_use even though it has been dropped. When this happens enough times we end up leaking all sessions, and no more are created because we've hit max open.

The solution is to not send the session through the oneshot. Instead use the oneshot to notify the waiter that a session is available in available_sessions. It can then try to take from there. This is wrapped in a loop to allow for racing with other acquire calls.

yoshidan commented 1 year ago

Cloud your please run cargo fmt to fix CI.

yoshidan commented 1 year ago

Thanks!