oracle / python-oracledb

Python driver for Oracle Database conforming to the Python DB API 2.0 specification. This is the renamed, new major release of cx_Oracle
https://oracle.github.io/python-oracledb
Other
308 stars 61 forks source link

oracledb2 hangs when getting connection from pool #275

Closed ThiefMaster closed 5 months ago

ThiefMaster commented 6 months ago

The same code worked just fine on oracledb<2, but after updating to oracledb==2.0.0, it just hangs during pool.acquire()

import oracledb
dsn = oracledb.makedsn('XXX', 10121, service_name='XXX')
print(dsn)
pool = oracledb.create_pool(dsn=dsn, user='XXX', password='XXX', min=1, max=5, increment=1, ping_interval=2)
print('created pool')
with pool.acquire() as conn:
    print('connected!')

When I set the pool's getmode to nowait, it fails with "DPY-4005: timed out waiting for the connection pool to return a connection" instead of freezing.

A normal connection using oracledb.connect(dns=dns, user=..., password=...) works fine.

Version information:

platform.platform: Linux-5.15.88-gentoo-x86_64-AMD_Ryzen_5_3600X_6-Core_Processor-with-glibc2.37
sys.maxsize > 2**32: True
platform.python_version: 3.12.1
oracledb.__version__: 2.0.0
cjbj commented 6 months ago

Can you triple check the old version is still OK? What's the DB version?

ThiefMaster commented 6 months ago

SELECT * FROM v$version:

('Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production', 'Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production\nVersion 19.17.0.0.0', 'Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production', 0)

Any yes, after a pip install oracledb<2 it works fine, while after updating it breaks again.

anthony-tuininga commented 6 months ago

That seems strange. The code you provided works just fine for me. The test suite also has very similar code and the entire test suite passes, too. That suggests something very specific to your configuration. If you break while the hang is occurring, can you provide the traceback? That may prove enlightening.

ThiefMaster commented 6 months ago
$ python /tmp/oracletest.py
^CTraceback (most recent call last):
  File "/tmp/oracletest.py", line 4, in <module>
    with pool.acquire() as conn:
         ^^^^^^^^^^^^^^
  File ".../env-312/lib/python3.12/site-packages/oracledb/pool.py", line 148, in acquire
    return self._connection_method(
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../env-312/lib/python3.12/site-packages/oracledb/connection.py", line 1134, in connect
    return conn_class(dsn=dsn, pool=pool, params=params, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../env-312/lib/python3.12/site-packages/oracledb/connection.py", line 520, in __init__
    impl = pool_impl.acquire(params_impl)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "src/oracledb/impl/thin/pool.pyx", line 594, in oracledb.thin_impl.ThinPoolImpl.acquire
  File "src/oracledb/impl/thin/pool.pyx", line 609, in oracledb.thin_impl.ThinPoolImpl.acquire
  File "/home/adrian/.pyenv/versions/3.12.1/lib/python3.12/threading.py", line 390, in wait_for
    self.wait(waittime)
  File "/home/adrian/.pyenv/versions/3.12.1/lib/python3.12/threading.py", line 355, in wait
    waiter.acquire()
KeyboardInterrupt
anthony-tuininga commented 6 months ago

That is, unfortunately, not enlightening. :-( Can you add a time.sleep() call immediately after the pool creation? A second should be enough unless your database is very remote? Let me know if that "resolves" the issue for you. I am still unable to replicate.

ThiefMaster commented 6 months ago

same problem :/

anthony-tuininga commented 6 months ago

Ok. Thanks for checking. Is this a regular local database? Or is it remote? Any special configuration?

I note that the output doesn't match your script above (which printed some progress)? Does it work ok if you use a standalone connection? Does it work if you increase the minimum number of connections to 2 and then wait long enough for both of them to be created by the background thread? Can you set the environment variable PYO_DEBUG_PACKETS to some value and ensure that the connections are in fact being created? Anything else you can think of that might pinpoint the source of the issue for you? If you are able and willing to build from source I can suggest a few changes to see where things are going awry. You are the only one to note this so far and pooling is quite heavily used -- and our testing team tried on all platforms (macOS, Windows, Linux) and all database versions before we released this version -- which all suggests something very specific to your configuration. I would very much like to know what it is that is causing the problem!

ThiefMaster commented 6 months ago
import oracledb
import time
dsn = oracledb.makedsn('pdb-*.cern.ch', 10121, service_name='PDB_******.cern.ch')
pool = oracledb.create_pool(dsn=dsn, user='***', password='***', min=1, max=5, increment=1, ping_interval=2, getmode=oracledb.POOL_GETMODE_FORCEGET)
print('created')
time.sleep(1)
print('acquiring')
with pool.acquire() as conn:
    print('connected!')

this is the script I've used the last time (just censored the full hostname and credentials).

The issue happens both on my alma9 production system and on my gentoo dev system

If you are able and willing to build from source I can suggest a few changes to see where things are going awry

Sure! (if you provide me with wheels I can also test those, might be even faster)

anthony-tuininga commented 6 months ago

Thanks. The output you provided earlier doesn't match, however, as it doesn't show any output. Can you verify that you are able to create a standalone connection to this database? Can you set the environment variable PYO_DEBUG_PACKETS to verify that the pool is creating the connection successfully? I'll look into some changes to help debug this situation for you.

ThiefMaster commented 6 months ago

Yes, when using this it works fine:

conn = oracledb.connect(dsn=dsn, user='***', password='***')
with conn:
    ...

--

PYO_DEBUG_PACKETS=1 shows me that there's lots of network activity while it keeps trying to connect. Not sure what exactly to look for in that data though and I assume it contains some sensitive data from authentication so I'd rather not post it publicly here. However, if you have matrix (or an email address) I could share it there.

anthony-tuininga commented 6 months ago

Thanks for confirming that. My e-mail address is posted in the setup configuration for the project: https://github.com/oracle/python-oracledb/blob/main/setup.cfg. Does increasing the minimum to 2 and increasing the number of seconds to wait to 2 help any?

ThiefMaster commented 6 months ago

min=2 doesn't help, neither does sleeping longer (or doing both). I emailed you the log.

anthony-tuininga commented 5 months ago

Thanks for your help with getting to the bottom of this issue. I was able to replicate and have verified the fix. I hope you are able to confirm that yourself!

ThiefMaster commented 5 months ago

I'll test the patch later once my D&D session finished ;)

ThiefMaster commented 5 months ago

The crash and sync problems are gone, async still randomly fails with "invalid redirect data" though

anthony-tuininga commented 5 months ago

I was able to replicate the "invalid redirect data" issue as well. In my configuration it occurs rather rarely but often enough for me to discover the source of the issue: the redirect packet comes in two pieces and the first part can be completely processed before the second part arrives, which the code didn't handle correctly. It now does. Let me know if you are able to verify that, too. Thanks again for your help!

anthony-tuininga commented 5 months ago

The patch has been included in version 2.0.1 which was just released.