oracle / python-oracledb

Python driver for Oracle Database conforming to the Python DB API 2.0 specification. This is the renamed, new major release of cx_Oracle
https://oracle.github.io/python-oracledb
Other
342 stars 69 forks source link

Retry count has no effect in thin mode when using an address list and running into a timeout #420

Closed quantuminternet closed 1 week ago

quantuminternet commented 1 week ago
  1. What versions are you using?

Database: 19.0.0.0.0 platform.platform: Linux-5.4.0-148-generic-x86_64-with-glibc2.31 sys.maxsize > 2**32: True platform.python_version: 3.12.6 oracledb.version: 2.4.1

I have not seen anything in the changelog that suggests that this has been addressed in 2.5.0

  1. Is it an error or a hang or a crash? error

  2. What error(s) or behavior you are seeing? We have a list of alternative IP addresses for an Oracle database. When one of them fails on connect with a timeout and retry_count is set to something greater than zero, we expect that the connection is attempted again with one of the other addresses. This works as expected in thick mode, but fails in thin mode: In thin mode, an exception is raised after the first attempt.

import oracledb
import time

# connection details have been redacted
working_ip = '[IP of Oracle database]'
broken_ip = '[IP that produces a timeout]'
service_name = '[Oracle service name]'
user = '[my user]'
password = '[my_password]'

dsn = f'(DESCRIPTION=(RETRY_COUNT=3)(CONNECT_TIMEOUT=2)(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST={broken_ip})(PORT=33001))(ADDRESS=(PROTOCOL=TCP)(HOST={working_ip})(PORT=33001))(LOAD_BALANCE=yes))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME={service_name})))'

for i in range(10):
    start = time.monotonic()
    success = False
    try:
        with oracledb.connect(dsn=dsn, user=user, password=password, tcp_connect_timeout=2, retry_count=3) as con:
            cursor = con.cursor()
            cursor.execute('select * from dual')
            r = cursor.fetchall()
            success = True
    except Exception as ex:
        print(ex)
    print(f'success: {success}, time: {time.monotonic() - start}')

This gives the following result:

DPY-6005: cannot connect to database (CONNECTION_ID=zcENDxX4Xx2538CUve3T2Q==).
timed out
success: False, time: 2.003529854118824
success: True, time: 0.15504495054483414
success: True, time: 0.15987224131822586
DPY-6005: cannot connect to database (CONNECTION_ID=MF8hAFPk2DrulkI9T3Xf1Q==).
timed out
success: False, time: 2.001703642308712
DPY-6005: cannot connect to database (CONNECTION_ID=v6Zs3XftGMih6hB45cJklA==).
timed out
success: False, time: 2.0028985515236855
success: True, time: 0.1483408883213997
DPY-6005: cannot connect to database (CONNECTION_ID=pa6I5LldNbNobwc2RXNuUg==).
timed out
success: False, time: 2.0020845904946327
DPY-6005: cannot connect to database (CONNECTION_ID=FE43mMP9hjofXzAcIm5P+g==).
timed out
success: False, time: 2.0028276592493057
DPY-6005: cannot connect to database (CONNECTION_ID=jJKdT0p8T+nOOH5j0Qu0ww==).
timed out
success: False, time: 2.002876453101635
success: True, time: 0.40784119069576263

So it looks like sometimes it suceeds on first try, but sometimes it fails on first try and immediately raises an exception, instead of trying a second time. Since it takes 2 seconds until the exception is raised (instead of 8), we can exclude that the first IP is tried multiple times

Running the same code in thick mode gives the expected result: Sometimes it succeeds on first try, sometimes it runs into a timeout after 2 seconds, but succeeds on second try

success: True, time: 0.18361841142177582
success: True, time: 0.19042250514030457
success: True, time: 2.1918497011065483
success: True, time: 2.1891980096697807
success: True, time: 0.18908774107694626
success: True, time: 2.182748392224312
success: True, time: 0.18147382885217667
success: True, time: 0.18108884245157242
success: True, time: 2.1901468336582184
success: True, time: 0.17620021849870682

Removing (LOAD_BALANCE=yes) results in the connection always failing after 2 seconds in thin mode and the connection always succeeding after 2 seconds in thick mode.

Putting (RETRY_COUNT=3) in the DSN or passing it directly as parameter to connect() does not seem to make a difference

  1. Does your application call init_oracle_client()?

The problem occurs in thin mode. If we call init_oracle_client, it works as expected

anthony-tuininga commented 1 week ago

I can confirm the issue and will work on a solution. Thanks for reporting it!

anthony-tuininga commented 1 week ago

I have pushed a patch that corrects this issue and have initated a build from which you can download pre-built development wheels once it completes. You can also build from source if you prefer.

quantuminternet commented 1 week ago

Great, thank you!

I have tested the wheel on the system we found the issue on and I can confirm that this fixes the issue.