oracle / python-cx_Oracle

Python interface to Oracle Database now superseded by python-oracledb
https://oracle.github.io/python-cx_Oracle
Other
887 stars 361 forks source link

outputtypehandler isn't work for AQ clob attributes? #558

Closed sinys15 closed 3 years ago

sinys15 commented 3 years ago
  1. What versions are you using?

cx_Oracle 8.1

Give your database version.

19.3

Also run Python and show the output of:

platform.platform: Linux-5.8.0-50-generic-x86_64-with-glibc2.32
sys.maxsize > 2**32: True)
platform.python_version: 3.8.6

cx_Oracle.version: 8.1.0
cx_Oracle.clientversion: (19, 6, 0, 0, 0)
  1. Describe the problem

I have AQ with UDT include 2 CLOB attributes. I read a message by queue.deqMany(10) and it takes about 12 seconds for about 3,1 Mb message (first CLOB attribute about 3Mb + second CLOB attribute about 1kb). I found out slow code line message.payload.ATTRCLOB1.read()

I have found this link https://cx-oracle.readthedocs.io/en/latest/user_guide/lob_data.html#directlobs and have tried to use it like:

 def output_type_handler(cursor, name, defaultType, length, precision, scale):
     print('123')
     if defaultType == cx_Oracle.DB_TYPE_CLOB:
         return cursor.var(cx_Oracle.DB_TYPE_LONG, arraysize=cursor.arraysize)
     if defaultType == cx_Oracle.DB_TYPE_BLOB:
         return cursor.var(cx_Oracle.DB_TYPE_LONG_RAW, arraysize=cursor.arraysize)
...
connection.outputtypehandler = output_type_handler

and I can't see that it speeded up and I can't see that method was called (I have expected '123' message). Is it a bug or what do I do wrong?

cjbj commented 3 years ago

To help us get to this quickly, can you share a runnable testcase?

sinys15 commented 3 years ago

Oracle script:

create table test_clob(clob_col clob)
/

declare
  v_json json_object_t := json_object_t();
  v_clob clob;
begin
  for i in 1..150e3
  loop
    v_json.put(dbms_random.string('x',10), dbms_random.string('x',10));
  end loop;

  v_clob := v_json.to_clob();

  dbms_output.put_line(dbms_lob.getlength(v_clob));

  insert into test_clob(clob_col) values(v_clob);
end;
/

commit
/

create or replace type T_MESSAGE as object(
  data clob,
  info clob
)
/

declare
  vQueueName varchar2(30) := 'QUEUE_TEST';
  vQueueTableName varchar2(30) := 'QUEUE_TABLE_TEST';
begin
  dbms_aqadm.create_queue_table(vQueueTableName, 'T_MESSAGE');
  dbms_aqadm.create_queue(vQueueName, vQueueTableName);
  dbms_aqadm.start_queue(vQueueName);
end;
/

declare
  v_enq_opt DBMS_AQ.enqueue_options_t;
  v_msg_prp DBMS_AQ.message_properties_t;
  v_msg_id  raw(16);
  v_clob clob;
begin
  select clob_col
    into v_clob
    from test_clob
   where rownum = 1;

  for i in 1..10
  loop
    dbms_aq.enqueue (
       queue_name         => 'QUEUE_TEST',
       enqueue_options    => v_enq_opt,
       message_properties => v_msg_prp,
       payload            => t_message(data => v_clob,
                                       info => '{"exchange_name":"my_exchange","routing_key":"my_routing_key"}'),
       msgid              => v_msg_id);
  end loop;
end;
/

commit
/

Python script:

import cx_Oracle
import time

connection = cx_Oracle.connect("user", "password", "host:port/service")
type_message = connection.gettype("T_MESSAGE")
queue = connection.queue("QUEUE_TEST", type_message)

def OutputTypeHandler(cursor, name, defaultType, size, precision, scale):
    if defaultType == cx_Oracle.DB_TYPE_CLOB:
        print('OutputTypeHandler clob raised')
        return cursor.var(cx_Oracle.DB_TYPE_LONG, arraysize=cursor.arraysize)

def read_from_table(patched: bool):
    cursor = connection.cursor()
    time_before = time.time()
    cursor.execute("select clob_col from test_clob where rownum = 1", [])
    clob_column = cursor.fetchone()
    if patched:
        clob_data = clob_column[0]
    else:
        clob_data = clob_column[0].read()
    # print("CLOB data:", clob_data)
    print("CLOB length:", len(clob_data))
    time_after = time.time()
    print("time:", str(time_after - time_before))

def read_from_aq():
    time_before = time.time()
    clob_data = ""
    messages = queue.deqMany(1)
    for message in messages:
        clob_data = message.payload.DATA.read()
    print("CLOB length:", len(clob_data))
    time_after = time.time()
    print("time:", str(time_after - time_before))

print('DEFAULT\nRead clob column from table')
print('first')
read_from_table(patched=False)
print('second')
read_from_table(patched=False)

print('\nRead clob from AQ')
print('first')
read_from_aq()
print('second')
read_from_aq()

connection.outputtypehandler = OutputTypeHandler
print('\nOutputTypeHandler patched.\nRead clob column from table')
print('first')
read_from_table(patched=True)
print('second')
read_from_table(patched=True)

print('\nRead clob from AQ')
print('first')
read_from_aq()
print('second')
read_from_aq()

my result:

DEFAULT
Read clob column from table
first
CLOB length: 3900001
time: 0.9043450355529785
second
CLOB length: 3900001
time: 0.6573898792266846

Read clob from AQ
first
CLOB length: 3900001
time: 0.8455414772033691
second
CLOB length: 3900001
time: 0.7538843154907227

OutputTypeHandler patched.
Read clob column from table
first
OutputTypeHandler clob raised
CLOB length: 3900001
time: 0.6785240173339844
second
OutputTypeHandler clob raised
CLOB length: 3900001
time: 0.7992420196533203

Read clob from AQ
first
CLOB length: 3900001
time: 0.8716306686401367
second
CLOB length: 3900001
time: 0.8395915031433105

conclusion: 1) OutputTypeHandler was called for cursor.fetch but it hasn't tangible speed effect of clob read. 2) OutputTypeHandler was not called for reads from AQ.

cjbj commented 3 years ago

Those times are too small to be comparable; you'd need to loop a few times. Many people have seen big performance improvements when fetching as a String.

Output type handlers are documented as:

This read-write attribute specifies a method called for each column that is going to be fetched from any cursor associated with this connection.

so I can understand why they don't affect AQ, since this is not a query that returns columns.

I don't think there is anyway to tell AQ to return a string instead of a 'LOB locator'. Perhaps @anthony-tuininga knows.

sinys15 commented 3 years ago

I have checked the test. It make improvements about 25-30% for your test. But still it isn't work for my test case:

import cx_Oracle
import time

connection = cx_Oracle.connect(...)

def OutputTypeHandler(cursor, name, defaultType, size, precision, scale):
    if defaultType == cx_Oracle.DB_TYPE_CLOB:
        print('OutputTypeHandler clob raised')
        return cursor.var(cx_Oracle.LONG_STRING, arraysize=cursor.arraysize)

def read_from_table(patched: bool):
    cursor = connection.cursor()
    time_before = time.time()
    cursor.execute("select clob_col from test_clob where rownum = 1", [])
    clob_column = cursor.fetchone()
    if patched:
        clob_data = clob_column[0]
    else:
        clob_data = clob_column[0].read()
    # print("CLOB data:", clob_data)
    # print("CLOB length:", len(clob_data))
    time_after = time.time()
    print("time:", str(time_after - time_before))

print('DEFAULT\nRead clob column from table')
for i in range(10):
    read_from_table(patched=False)

connection.outputtypehandler = OutputTypeHandler
print('\nOutputTypeHandler patched.\nRead clob column from table')
for i in range(10):
    read_from_table(patched=True)
DEFAULT
Read clob column from table
time: 1.3364217281341553
time: 0.8301796913146973
time: 0.9486052989959717
time: 0.9425868988037109
time: 0.9459102153778076
time: 0.9294209480285645
time: 0.9359176158905029
time: 0.8553953170776367
time: 1.0000009536743164
time: 1.228334665298462

OutputTypeHandler patched.
Read clob column from table
OutputTypeHandler clob raised
time: 1.0885541439056396
OutputTypeHandler clob raised
time: 1.1074016094207764
OutputTypeHandler clob raised
time: 1.060746669769287
OutputTypeHandler clob raised
time: 1.0482861995697021
OutputTypeHandler clob raised
time: 1.0315427780151367
OutputTypeHandler clob raised
time: 1.0293917655944824
OutputTypeHandler clob raised
time: 0.9692401885986328
OutputTypeHandler clob raised
time: 0.929304838180542
OutputTypeHandler clob raised
time: 0.9637398719787598
OutputTypeHandler clob raised
time: 1.004173994064331

Is it data or data length dependent?

anthony-tuininga commented 3 years ago

I don't think there is anyway to tell AQ to return a string instead of a 'LOB locator'. Perhaps @anthony-tuininga knows.

There is indeed no way to have database object attributes be returned as strings instead of LOBs. PL/SQL is limited to 32,767 bytes so anything larger than that must be transported as a LOB.

I'm looking further into the performance differences for queries as I am not seeing now what I was before. I'll get back to you on that...

anthony-tuininga commented 3 years ago

I just discovered the source of my slowdown. I had tracing enabled in my SQLNET.ORA configuration file! Without that in place I see that the fetch as LOB is significantly slower than the fetch as STRING (with the output type handler). The difference is more dramatic for smaller LOB sizes but it is still present even at large sizes. Maybe check to see if tracing is enabled for you, too?

sinys15 commented 3 years ago

I don't have SQLNET.ORA on client (I use instant client) and SQLNET.ORA on the server don't have any TRACE parameters.

anthony-tuininga commented 3 years ago

Ok. I reran your exact test and I am seeing approximately twice the performance when using the output type handler (using a database that is remote to the client but on the same network, as well as when using a database on the same machine as the client). I'm not sure why you're not seeing the same behavior. Its worth noting that on my machine I am seeing times of approximately 90 ms for reading the LOB and about 40 ms for using the output type handler (when using the remote database) and about 40 ms for reading the LOB and 10-15 ms for using the output type handler (when using a local database). Your numbers indicate your database/client is 10-15x slower! It might be worth setting the environment variable DPI_DEBUG_LEVEL to the value 4 before running your test cases and examining the output to see if there is some call in particular that is using a lot of time. If the call to dpiStmt_execute() or dpiStmt_fetchRows() is the one taking a lot of time, you can also set DPI_DEBUG_LEVEL to 36. This will take note of the memory allocations. When tracing was enabled for me I was seeing about 15-20 ms between successive memory allocations (when using the output type handler). There should be no (noticeable) time between successive memory allocations.

stale[bot] commented 3 years ago

This issue has been automatically marked as inactive because it has not been updated recently. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] commented 3 years ago

This issue has been automatically closed because it has not been updated for a month.