Open intgr opened 4 months ago
Are you able to reproduce it consistently in your environment? It is possible that multiple concurrent attempts to close the connection are being made -- but my attempts to reproduce have been unsuccessful and a close look of the code suggests that is not the source of the issue either. The call stack shows that it is failing during the actual sending of the LogOff message. If you are able to reproduce consistently can make the following changes and build from source?
index b9c7e52..14660cb 100644
--- a/src/oracledb/impl/thin/protocol.pyx
+++ b/src/oracledb/impl/thin/protocol.pyx
@@ -126,6 +126,10 @@ cdef class Protocol(BaseProtocol):
if self._read_buf._transport is None:
self._transport = None
+ if self._transport is not None \
+ and self._transport._transport is None:
+ self._force_close()
+
# if the session was marked as needing to be closed, force it
# closed immediately (unless it was already closed)
if self._read_buf._session_needs_to_be_closed \
Now that I have more log data to work with, I have a better idea what's going on. Unfortunately it only happens in one production environment and I would avoid experimenting there unless absolutely necessary.
It is possible that multiple concurrent attempts to close the connection are being made
Django database connections are per-thread, and I don't believe my application is sharing connections between threads.
But I noticed that this error only occurs in Celery processes. We're using Celery's default worker pool, which launches new worker processes with Unix fork()
. Which means after the time of fork, two processes share the same connection file descriptor. That's a recipe for surprises for sure.
Celery has some logic to try to solve it -- it attempts to directly close DB file descriptors in the child process, to prevent multiple processes from trying to send on the same socket:
However, since oracledb Connection
does not implement fileno()
method, this part is skipped, and it calls the underlying Connection.close()
method next.
Not exactly sure what the right solution here would be even. Maybe oracledb can implement fileno()
so users have a chance of working around this issue. Or oracledb could somehow detect forking when process PID changes
Ah, in that case can you put together a test case using Celery that demonstrates the problem?
It seems to me, though, that it would be better for Django to close the connections before the process is forked. That would avoid these sorts of issues. I know that there is no support (in Oracle) for attempting to do what you are doing here. That said, if you can find a test case to replicate I can see what can be done to avoid the situation if possible.
@intgr is there any update on this?
@intgr Any update on this issue.. I'm facing the same issue. Is there any workaround?
@vigneshwarrvenkat, are you able to put together a test case that demonstrates the issue? Are you using the latest version of python-oracleb (2.2)?
I recently upgraded to Django 5.0, which also means switching from
cx_Oracle
tooracledb
.Sometimes when Django attempts to close the database connection, it could fail with the following error. Stack trace below. I've seen this pop up in two different contexts so far. This example is when running a custom Django
manage.py
command.Crash
I am using thin mode.
I haven't been able to reproduce this in isolation. If the above stack trace isn't enough to troubleshoot the issue, please let me know and I can try if I can figure it out.