Closed aholmes closed 3 months ago
Looks good to me. If I'm understanding correctly, this was the cause of the exception, which then causes the invalid connection. Am I correct that if there were to be another unrelated (but similar) issue in a different SQL query, we'd have the same connection issue?
Looks good to me. If I'm understanding correctly, this was the cause of the exception, which then causes the invalid connection. Am I correct that if there were to be another unrelated (but similar) issue in a different SQL query, we'd have the same connection issue?
I think regardless of an exception, this can still end up in a weird state. Any queries issued through this session may begin a transaction on PSQL, return data to Python and, when the Python function returns, the session is still connected in the connection pool, and the transaction is still open - but we've now lost the function-scoped reference to the session that opened it. I'm not entirely clear on what's happening under the hood with regards to what SQLAlchemy does when references are lost from scopes exiting. I assume it doesn't do anything, which is why the transaction remains.
Secondarily, we don't specify any idle session or in-transaction timeouts for our connections, so the problem is compounded by the fact that the transaction never actually ends. Then, later, I believe the connection is pulled from the connection pool and a new query tries to start a transaction but cannot because of the transaction/connection in a weird state. I am making a lot of assumptions here, but this seems to be the gist of what's happening.
This fix ensures that this particular session usage cleans up when it exits the with
scope. It is possible we can make this mistake in other places and have the exact same issue. It occurs often for us because we almost always query user-related information some time during the lifetime of the application, which means this code almost always runs. We can make things more robust by:
idle_session_timeout
when SQLAlchemy connectsidle_in_transaction_session_timeout
when SQLAlchemy connectsThese options will at least ensure that any future issues caused by similar code will either be limited in scope, or will prevent the issue by masking that some piece of code is not handling the session correctly.
These options will at least ensure that any future issues caused by similar code will either be limited in scope, or will prevent the issue by masking that some piece of code is not handling the session correctly.
Sounds good. This is definitely an improvement. Some mitigation of future, unforeseen errors is great.
These options will at least ensure that any future issues caused by similar code will either be limited in scope, or will prevent the issue by masking that some piece of code is not handling the session correctly.
Sounds good. This is definitely an improvement. Some mitigation of future, unforeseen errors is great.
To be clear, these options are not part of this PR; they are options we can explore later.
This may be the cause of errors in CAP when SQLAlchemy tries to user a connection in the pool that has been invalidated due to timeout.
The issue here is that the session created is not closed, and the transaction is not rolled back or committed.