oracle / python-oracledb

Python driver for Oracle Database conforming to the Python DB API 2.0 specification. This is the renamed, new major release of cx_Oracle
https://oracle.github.io/python-oracledb
Other
342 stars 69 forks source link

New two phase transaction in thin mode has different default timeout than thick mode #376

Open CaselIT opened 3 months ago

CaselIT commented 3 months ago

Hi,

With the 2.3 release I've added support for oracledb two phase commit in sqlalchemy. https://github.com/sqlalchemy/sqlalchemy/commit/a9c0487c024410d446b8be3f528e051318dd150e In the ci we noticed that a test was unreliable and decided to skip it in the ci. The test is this one https://github.com/sqlalchemy/sqlalchemy/blob/6cf5e2a188fc5e337d22a098a5fe9a9fe10cc7e7/test/engine/test_transaction.py#L476-L527 when commit=True. The same code using only oracledb is reported below.

Initially the idea was to ask guidance since it seemed there was some kind of timeout, but trying the standalone example with different timeout and with and without the oracle client, seem to have surfaced a bug in the thin mode implementation.

  1. What versions are you using?

2.3.0

  1. Is it an error or a hang or a crash?

error

  1. What error(s) or behavior you are seeing?

    oracledb.exceptions.DatabaseError: ORA-24756: transaction with identifier gti: 0x000c371f.666F6F bid: 0x626172
    does not exist
  2. Does your application call init_oracle_client()?

no, calling it fixes it

  1. Include a runnable Python script that shows the problem.
import time
import oracledb
import random
import sys
import platform

print("platform.platform:", platform.platform())
print("sys.maxsize > 2**32:", sys.maxsize > 2**32)
print("platform.python_version:", platform.python_version())
print("oracledb.__version__:", oracledb.__version__)

# oracledb.init_oracle_client() # calling this fixes it
url = dict(user="scott", password="tiger", host="127.0.0.1", port=1521, service_name="FREEPDB1")

def setup():
    with oracledb.connect(**url) as connection, connection.cursor() as c:
        c.execute("CREATE TABLE tbl(id integer, name varchar2(42))")
        connection.commit()

def teardown():
    with oracledb.connect(**url) as connection, connection.cursor() as c:
        c.execute("drop table tbl")
        connection.commit()
        for x in connection.tpc_recover():
            try:
                connection.tpc_forget(x)
            except:
                pass

def using_oracledb():
    connection = oracledb.connect(**url)
    xid_tuple = (base + i, b"foo", b"bar")
    xid = connection.xid(*xid_tuple)
    connection.tpc_begin(xid)
    with connection.cursor() as c:
        c.execute(
            "insert into tbl(id, name) values (:id, :name)",
            dict(id=1, name="user1"),
        )
    assert connection.tpc_prepare()
    connection.close()

    with oracledb.connect(**url) as connection2, connection2.cursor() as c:
        assert c.execute("select id from tbl").fetchall() == []

    with oracledb.connect(**url) as connection3:
        recoverables = connection3.tpc_recover()
        assert xid in recoverables
        # time.sleep(2) # adding a timeout makes it error in very few tries (1 or 2)
        connection3.tpc_commit(xid)

        with connection3.cursor() as c:
            assert c.execute("select id from tbl").fetchall() == [(1,)]
        print("ok", xid)

    with oracledb.connect(**url) as connection4, connection4.cursor() as c:
        c.execute("delete from tbl")
        connection4.commit()

# teardown()
setup()
try:
    base = random.randint(0, 1_000_000)
    i = 0
    while True:
        print("try", i + 1, '\n')
        using_oracledb()
        i += 1
        if i % 20 == 0 and input(f"Run {i}. Exit? (y)") == "y":
            break
finally:
    teardown()

The above script errors usually after 10 to 30 runs:

platform.platform: Windows-11-10.0.22631-SP0
sys.maxsize > 2**32: True
platform.python_version: 3.12.4
oracledb.__version__: 2.3.0
try 1

ok Xid(format_id=375793, global_transaction_id=b'foo', branch_qualifier=b'bar')
try 2

ok Xid(format_id=375794, global_transaction_id=b'foo', branch_qualifier=b'bar')
...
try 15

Traceback (most recent call last):
  File "path\to\test4.py", line 77, in <module>
    using_oracledb()
  File "path\to\test4.py", line 59, in using_oracledb
    connection3.tpc_commit(xid)
  File "path\to\oracledb\connection.py", line 1057, in tpc_commit
    self._impl.tpc_commit(xid, one_phase)
  File "src\\oracledb\\impl/thin/connection.pyx", line 470, in oracledb.thin_impl.ThinConnImpl.tpc_commit
  File "src\\oracledb\\impl/thin/protocol.pyx", line 434, in oracledb.thin_impl.Protocol._process_single_message
  File "src\\oracledb\\impl/thin/protocol.pyx", line 435, in oracledb.thin_impl.Protocol._process_single_message
  File "src\\oracledb\\impl/thin/protocol.pyx", line 428, in oracledb.thin_impl.Protocol._process_message
  File "src\\oracledb\\impl/thin/messages.pyx", line 72, in oracledb.thin_impl.Message._check_and_raise_exception
oracledb.exceptions.DatabaseError: ORA-24756: transaction with identifier gti: 0x0005bbff.666F6F bid: 0x626172
 does not exist
Help: https://docs.oracle.com/error-help/db/ora-24756/

uncommenting the timeout makes the fail even quicker:

platform.platform: Windows-11-10.0.22631-SP0
sys.maxsize > 2**32: True
platform.python_version: 3.12.4
oracledb.__version__: 2.3.0
try 1

Traceback (most recent call last):
  File "path\to\test4.py", line 77, in <module>
    using_oracledb()
  File "path\to\test4.py", line 59, in using_oracledb
    connection3.tpc_commit(xid)
  File "path\to\oracledb\connection.py", line 1057, in tpc_commit
    self._impl.tpc_commit(xid, one_phase)
  File "src\\oracledb\\impl/thin/connection.pyx", line 470, in oracledb.thin_impl.ThinConnImpl.tpc_commit
  File "src\\oracledb\\impl/thin/protocol.pyx", line 434, in oracledb.thin_impl.Protocol._process_single_message
  File "src\\oracledb\\impl/thin/protocol.pyx", line 435, in oracledb.thin_impl.Protocol._process_single_message
  File "src\\oracledb\\impl/thin/protocol.pyx", line 428, in oracledb.thin_impl.Protocol._process_message
  File "src\\oracledb\\impl/thin/messages.pyx", line 72, in oracledb.thin_impl.Message._check_and_raise_exception
oracledb.exceptions.DatabaseError: ORA-24756: transaction with identifier gti: 0x00064460.666F6F bid: 0x626172
 does not exist
Help: https://docs.oracle.com/error-help/db/ora-24756/

With or without timeout using the oracle client results in no error (run about 100 try without timeout and 60 with). I've tried only on windows, but @zzzeek can reproduce locally on linux (fedora I think) using a different oracle version and config

anthony-tuininga commented 3 months ago

Ok, I looked into this and technically this isn't a bug. The problem is that if you call tpc_begin() without specifying a value for the timeout parameter, the default is 0 and that means you are telling the system that as soon as a transaction becomes inactive, the system can clean it up. So whether it actually gets around to it or not defines whether or not you get the error you are seeing. If you ensure that you set a value for the timeout parameter that covers the length of time that the transaction will be inactive, then the problem goes away completely. The fact that thick mode doesn't exhibit this behavior suggests to me that the default is not actually 0 but some other value. I'll ask internally and find out what it is and change thin mode (and the documentation) to whatever I discover. In the meantime, you can simply set the timeout parameter yourself and verify that the issue is resolved.

CaselIT commented 3 months ago

Thanks for checking. So a timeout was invited!

I failed to notice that timeout parameter in tpc_begin, but the behaviour seems consistent with it.

I'll retry the example with a timeout set, and see if it's settable using the sqlalchemy api

CaselIT commented 3 months ago

it does indeed seem the timeout fault. I think it would make sense from an usage point of view to have the same timeout as the thick mode, or at least document the difference.

see if it's settable using the sqlalchemy api

seems not. Will coordinate with mike about accepting kw-args there

anthony-tuininga commented 3 months ago

it does indeed seem the timeout fault. I think it would make sense from an usage point of view to have the same timeout as the thick mode, or at least document the difference.

I agree and intend to make them the same once I find out what thick mode does. Stay tuned!