oracle / python-oracledb

Python driver for Oracle Database conforming to the Python DB API 2.0 specification. This is the renamed, new major release of cx_Oracle
https://oracle.github.io/python-oracledb
Other
307 stars 59 forks source link

DbObject.__setattr__ performance issue. #297

Closed mkmoisen closed 1 month ago

mkmoisen commented 4 months ago
r_foo_type = conn.gettype('HELLO.XXTEST_TEST_PKG.R_FOO')

r_foo_obj = r_foo_type.newobject()

# This takes 0.2 seconds from my local host to a remote database
r_foo_obj.BAR = 'a'  
# This takes another 0.2 seconds
r_foo_obj.BAZ = 'a' * 4000

Running that for a record with 15 attributes, with hundreds of records in a collection, results in a significant performance hit.

My guess here is that each time you call DbObject.__setattr__ it results in a network call to the database.

I think that should either happen all at once during curr.callproc/curr.callfunc, or if there needs to be a call, it would be better if we could set all the attributes at once:


  1. What versions are you using?

oracledb==2.0.1

Give your database version.

19c

>>> import sys
>>> import platform
>>>
>>> print("platform.platform:", platform.platform())
platform.platform: Linux-5.15.133.1-microsoft-standard-WSL2-x86_64-with-glibc2.28
>>> print("sys.maxsize > 2**32:", sys.maxsize > 2**32)
sys.maxsize > 2**32: True
>>> print("platform.python_version:", platform.python_version())
platform.python_version: 3.12.2

And:

>>> import oracledb
>>> print("oracledb.__version__:", oracledb.__version__)
oracledb.__version__: 2.0.1
  1. Is it an error or a hang or a crash?

Performance Issue

  1. What error(s) or behavior you are seeing?

Calling DbObject.__setattr__ takes a non-negligible amount of time.

When populating a plsql record with 15 attributes, and doing this in a loop over many records in a collection, this takes significant time.

I believe that what is happening is that each time DbObject.__setattr__ is called, oracledb is actually making a network call. I infered this because when I run this code on my localhost connecting to a remote database, it takes 0.2 seconds per each dbObject.__setattr__ call. Calling this code from a server in the same data center as the remote database is 100x faster, 0.002 seconds. However that is still relatively slow.

Assuming I am correct, what is the point of making a network call to the database when we set an attribute? Can't it just be sent all at once when we call callfunc or callproc ?

If a network call is required for some reason, it might help if there was an API where we can set all the attributes at once, and make one network call per record instead of per attribute.

  1. Does your application call init_oracle_client()?

Yes, thick mode.

  1. Include a runnable Python script that shows the problem.

import oracledb
oracledb.init_oracle_client()

import contextlib
from datetime import datetime

@contextlib.contextmanager
def timeit():
    s = datetime.now()
    yield
    e = datetime.now()
    print((e-s).total_seconds())

conn = oracledb.connect('hello/password@foo')

r_foo_type = conn.gettype('HELLO.XXTEST_TEST_PKG.R_FOO')

r_foo_obj = r_foo_type.newobject()  # No way to set all the attributes at once

# Both of these take about 0.2 seconds from my localhost to remote database, or 0.002 seconds if from the same datacenter
with timeit():
    r_foo_obj.BAR = 'a'

with timeit():
    r_foo_obj.BAZ = 'a' * 4000

Include all SQL needed to create the database schema.

CREATE OR REPLACE PACKAGE xxtest_test_pkg
IS
    TYPE r_foo IS RECORD (
        bar VARCHAR2(4000),
        baz VARCHAR2(4000)
    );
END xxtest_test_pkg;
/

CREATE OR REPLACE PACKAGE BODY xxtest_test_pkg
IS

END xxtest_test_pkg;
/
anthony-tuininga commented 4 months ago

This is known behavior in thick mode. There is no round trip for each call to setattr() but there is a poor implementation in place in the Oracle Client libraries. I would recommend trying with thin mode.

anthony-tuininga commented 4 months ago

I created this script instead to use the timeit module:

import timeit

import oracledb
# oracledb.init_oracle_client()

conn = oracledb.connect('hello/password@foo')

r_foo_type = conn.gettype('HELLO.XXTEST_TEST_PKG.R_FOO')

r_foo_obj = r_foo_type.newobject()  # No way to set all the attributes at once

result = timeit.timeit("value.BAR = 'a'", number=1000,
                       globals=dict(value=r_foo_obj))
print("result 1:", result)

result = timeit.timeit("value.BAR = 'a' * 4000", number=1000,
                       globals=dict(value=r_foo_obj))
print("result 2:", result)

On my machine, 1000 iterations takes about 100 ms in thick mode and < 1 ms in thin mode. No round trip to the database occurs in either mode.

mkmoisen commented 4 months ago

Hi @anthony-tuininga

When I run from inside of my docker on my localhost, 1000 iterations took 141 seconds.

However running the same code on my normal computer, outside of docker, 1000 iterations took 0.076 seconds.

Running from inside of my docker on my remote server using the identical container image takes 0.70 seconds.

Seems like the combination of localhost and docker is the only place where there is any noticeable performance issue.


Any idea what could be causing this, or how I can debug it further? ldap.ora, sqlnet.ora etc. are all identical.

I've tried the following from both my local non-docker and my local docker, all perform about the same: opening and closing connections, select 1 from dual, callproc, callfunc,

anthony-tuininga commented 4 months ago

Are you able to use thin mode? Do you see the same behavior there?

anthony-tuininga commented 4 months ago

I just did a packet trace with thick mode and I am seeing a round trip for each attribute that is being set -- which should not be happening. I'll do some more investigation and get back to you.

mkmoisen commented 4 months ago

@anthony-tuininga

Unfortunately I cannot use thin mode because my database requires ldap to connect, which is not supported in oracledb's thin mode.

However, using a hack which I'm afraid to use in production, I cannot replicate this issue in thin mode.

Do you think something like this acceptable for me to use if I wanted to use thin mode? I got this from here.

import ldap3
import oracledb

def connect(schema, password, database):
    with ldap3.Connection('my-ldap.com') as conn:
        conn.search('cn=OracleContext,dc=my,dc=com', f"(cn={database})", attributes=['orclNetDescString'])
        tns_string = conn.entries[0].orclNetDescString.value
        print(tns_string)

    return oracledb.connect(f'{schema}/{password}@{tns_string}')

I'm not sure if oracle thick mode handles ldap exactly like this under the hood or if it has a bunch of additional features.

anthony-tuininga commented 4 months ago

How the database connect string is looked up shouldn't matter, so I don't see there being any problem with using that approach. There are definitely a bunch of additional features but as noted in the issue you referenced very few people seem to use them -- and you seem to be yet another one!

I would recommend using this:

return oracledb.connect(user=schema, password=password, dsn=tns_string)

That avoids the necessity of having the driver parse the string you just created back into its constituent parts again!

The ability to support LDAP is on the (rather long) list of enhancements.

cjbj commented 2 months ago

Closing - no activity.

mkmoisen commented 2 months ago

Hi @cjbj and @anthony-tuininga

Anthony mentioned here:

I just did a packet trace with thick mode and I am seeing a round trip for each attribute that is being set -- which should not be happening. I'll do some more investigation and get back to you.

May we please keep this issue open until this is fixed?

cjbj commented 2 months ago

Sure

cjbj commented 2 months ago

@mkmoisen Anthony pointed me at the Oracle bug he logged for you. It is already fixed in a future version of Oracle Database. If you need a backport, contact Oracle Support and quote bug number 36299901.

anthony-tuininga commented 1 month ago

This has been corrected in 23ai which was just released today.