Closed zzzeek closed 5 months ago
Running from windows with oracle on docker (podman on fedora over wsl2 actually) I can't reproduce (stopped with loop over 1700) with 23 (image gvenzl/oracle-free:23-slim
) and 21 (image gvenzl/oracle-xe:21-slim
)
With 18 (image gvenzl/oracle-xe:18-slim
) it tripped at 534. Stack trace below
Maybe it has something to do with version 18 of oracle?
@zzzeek maybe the best solution is to update the ci to 21 or 23? I'll leave the script running a bit more on 21 to see if it happens
Traceback (most recent call last):
File "C:\Users\cfede\scoop\apps\python\current\Lib\asyncio\runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\cfede\scoop\apps\python\current\Lib\asyncio\base_events.py", line 684, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "C:\Users\cfede\Dev\GitHub\sqlalchemy\test4.py", line 25, in main
await run_async_thing(conn)
File "C:\Users\cfede\Dev\GitHub\sqlalchemy\test4.py", line 64, in run_async_thing
await cursor.executemany(
File "C:\Users\cfede\venvs\sa\Lib\site-packages\oracledb\cursor.py", line 1067, in executemany
await self._impl.executemany(
File "src\\oracledb\\impl/thin/cursor.pyx", line 297, in executemany
File "src\\oracledb\\impl/thin/protocol.pyx", line 817, in _process_single_message
File "src\\oracledb\\impl/thin/protocol.pyx", line 818, in oracledb.thin_impl.BaseAsyncProtocol._process_single_message
File "src\\oracledb\\impl/thin/protocol.pyx", line 769, in _process_message
File "src\\oracledb\\impl/thin/protocol.pyx", line 872, in _reset
File "src\\oracledb\\impl/thin/packet.pyx", line 571, in wait_for_packets_async
asyncio.exceptions.CancelledError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\cfede\Dev\GitHub\sqlalchemy\test4.py", line 79, in <module>
asyncio.run(main())
File "C:\Users\cfede\scoop\apps\python\current\Lib\asyncio\runners.py", line 194, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "C:\Users\cfede\scoop\apps\python\current\Lib\asyncio\runners.py", line 123, in run
raise KeyboardInterrupt()
KeyboardInterrupt
Just to add that the same suite has run for years using oracledb and cx-oracle with the sync version and it never hanged
well we can update to 21/23 sure, for now I just have the test marked to skip on oracledb async. great news it repros!
I'll leave the script running a bit more on 21 to see if it happens
over 25k and it was still running. By this limited test it seems this has something to do with 18 only.
interestingly when I went to stop the 18 container (gvenzl/oracle-xe:18-slim
) after the script hanged the db did not close cleanly, and got this warning
WARN[0010] StopSignal SIGTERM failed to stop container oracle in 10 seconds, resorting to SIGKILL
I noticed with #277 that Python 3.11 and higher do not experience the hang but Python 3.10 and lower do. I also noticed that the uvloop alternative implementation works fine with Python 3.10 and lower. Which Python version are you using? Are you able to try with the uvloop implementation? It may be something specific to the asyncio implementation.
I get the hang here with python 3.12 and we see it on 3.11 as well
Ok. Thanks for confirming that. That suggests the issues are unrelated. It was worth checking though!
I was able to replicate with Oracle Database 18c and discovered that it was a problem with the break/reset mechanism timing. I have pushed a patch that corrects this for me. If you are able to build from source you can verify that it works for you, too.
I've tried with 18c on docker installing from source locally and I could no longer reproduce, stopped when the counter was over 26k. (this is with python on windows to oracle on podman on wsl)
Seems to be fine here.
yup seems fine now, I confirmed I can still reproduce w/ the old version and the main version fixes the issue.
Thanks for the confirmation!
The patch has been included in version 2.0.1 which was just released.
@zzzeek, @CaselIT, I have just pushed this change which pretty much reverts the patch that was originally made to resolve this issue. I was able to confirm that the issue remains in the original code prior to the patch but no longer does in the code I just pushed. Just to be safe, if you could run the code as well with this commit and verify that it is not an issue for you, that would be appreciated. I believe other changes have eliminated the original issue, but I'd like to be sure!
is there any quick way to build just the thin client without it looking for OCI for the thick client? or can you put up a wheel file somewhere
The build of the driver doesn't require OCI or the thick client in any way. You can simply build from source. That said, I believe I was able to confirm that the original issue reported here has been corrected a different way -- so this is just paranoia, I think!
OK I can confirm the latest main branch does not have the issue, after I re-confirmed that 2.0.0 still reproduces on this machine, so we're good
Thanks, @zzzeek!
hi gang -
our sqlalchemy test suite for the new async oracledb driver we are developing hangs semi-reliably on one particular series of tests that are fielding an intentional IntegrityError from a cursor.executemany() call.
the condition is extremely difficult to reproduce, and I've come up with a script that seems to reproduce it a bit more portably when run on a linux machine, but it still takes many iterations for the condition to hit. The script below runs the same set of operations over and over again - running it against an Oracle 18c express database here, it will hang usually within 200 iterations. but i have no idea if it will do this in other circumstances. if i move the create table step out of the loop, it doesnt hang at all.
so if I run it here it looks like:
then it's hung. press ctrl-C and stack shows as follows:
that's what I got so far. Will see if @caselit can reproduce on his environment.