oracle / python-cx_Oracle

Python interface to Oracle Database now superseded by python-oracledb
https://oracle.github.io/python-cx_Oracle
Other
890 stars 361 forks source link

Throttling INSERT function for files over 750KB #616

Closed sme31 closed 2 years ago

sme31 commented 2 years ago
  1. What versions are you using?

cx_Oracle 8.3.0 Python 3.7 Ubuntu 16.04 Oracle 12c Enterprise Edition

Also run Python and show the output of: print("platform.platform:", platform.platform()) print("sys.maxsize > 232:", sys.maxsize > 232) print("platform.python_version:", platform.python_version())

platform.platform: Linux-4.4.0-210-generic-x86_64-with-Ubuntu-16.04-xenial sys.maxsize > 2**32: True platform.python_version: 3.7.10

And: print("cx_Oracle.version:", cx_Oracle.version) print("cx_Oracle.clientversion:", cx_Oracle.clientversion()) cx_Oracle.version: 8.3.0 cx_Oracle.clientversion: (21, 5, 0, 0, 0)

  1. Describe the problem

When I try to INSERT a BLOB it takes less than 1 second for files up to 750KB in size. For files between 750KB and 1.5MB it takes 26 seconds. For files between 1.5MB and 2.25MB it takes 52 seconds, and so on. It seems that I am being throttled by 26 seconds for every 750KB of data I am writing.

  1. Include a runnable Python script that shows the problem.
    
    def write_doc(doc,blob,conn):
    sql="INSERT INTO my_data (doc_id,doc_content) VALUES (:dociddata, :blobdata)"
    cursor = conn.cursor()
    cursor.execute(sql, dociddata=doc, blobdata=blob)
    conn.commit()
    cursor.close()

conn = cx_Oracle.connect(user="Myuser", password="MyPassword",dsn="ocm.server.here.com:1527/some_name",encoding="UTF-8") doc_csv = "/tmp/document_list.csv" csv_file=open(doc_csv, 'r')

for line in csv_file: splitLineArray = line.split(',') documentId = splitLineArray[17]

#Pull document down from SOAP API
documentData = (client.service.getDocument(int(documentId)))        
write_doc(documentId, documentData, conn)


Include all SQL needed to create the database schema.

CREATE TABLE  my_data (doc_id varchar(40), doc_content blob);

Is this a cx_Oracle problem or a Oracle DB server problem?   Any clues how to resolve this?
cjbj commented 2 years ago

Duplicate of https://stackoverflow.com/questions/71398255/very-slow-writing-blob-to-oracle-db-using-python-cx-oracle

sme31 commented 2 years ago

Duplicate of https://stackoverflow.com/questions/71398255/very-slow-writing-blob-to-oracle-db-using-python-cx-oracle

Yep. Didn't get an answer there either.

Its also a duplicate of https://community.oracle.com/tech/developers/discussion/4495704/insering-a-blob-using-python-and-cx-oracle-library-takes-a-long-time-for-big-files#latest

No answer there either. Looks like I've stumped the experts with this one!

anthony-tuininga commented 2 years ago

Can you provide a complete but simplified runnable test case that demonstrates the issue? In other words, something that does this:

  1. Connects to the database
  2. Generates data of length 500k -- the simplest is to just do "A" (500 1024)
  3. Inserts the data into the database (and provides the time)
  4. Generates data of length 1500k
  5. Inserts that data into the database (and provides the time)

Also include the create table statement you used to create the table you are testing with.

This has the advantage of ensuring that there isn't something else contributing to the problem and it also means that we can run the test case ourselves.

sme31 commented 2 years ago

Thanks Anthony - hopefully this will demonstrate the problem :

import cx_Oracle
import datetime

conn = cx_Oracle.connect(user="myusername", password="mypassword",dsn="ocm.myserverproxy.here.uk:1527/some_name",encoding="UTF-8")
cursor=conn.cursor()

cursor.execute("CREATE TABLE blob_test (blob_data blob)")

small_blob=b"A" * (500*1024)
print("Writing the small blob started at - " +  (datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")))
cursor.execute("INSERT INTO blob_test VALUES(:1)", [small_blob])
conn.commit()
print("Writing the small blob finished at - " +  (datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")))

large_blob=b"A" * (1500*1024)
print("Writing the large blob started at - " +  (datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")))
cursor.execute("INSERT INTO blob_test VALUES(:1)", [large_blob])
conn.commit()
print("Writing the large blob finished at - " +  (datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")))

cursor.execute("DROP TABLE blob_test")
cursor.close()
conn.close()

This gives me an output of -

Writing the small blob started at - 2022-03-11 15:20:28
Writing the small blob finished at - 2022-03-11 15:20:28
Writing the large blob started at - 2022-03-11 15:20:28
Writing the large blob finished at - 2022-03-11 15:21:20
anthony-tuininga commented 2 years ago

I just tried that code. This is my output:

Writing the small blob started at - 2022-03-11 08:33:49
Writing the small blob finished at - 2022-03-11 08:33:49
Writing the large blob started at - 2022-03-11 08:33:49
Writing the large blob finished at - 2022-03-11 08:33:49

There is only a small delay for the 1.5 MB case. My database is on the same network as the client.

I then tried a cloud database. I then got the following results:

Writing the small blob started at - 2022-03-11 08:40:55
Writing the small blob finished at - 2022-03-11 08:40:56
Writing the large blob started at - 2022-03-11 08:40:56
Writing the large blob finished at - 2022-03-11 08:40:58

The second case is definitely slower, but not by a significant amount.

That leads me to suspect some configuration issue. The name of your server suggests some sort of proxy is in use? That might be the source of the slowdowns?

sme31 commented 2 years ago

That's very re-assuring, in that it rules out a cx_Orcale library issue. I am indeed connecting via Oracle Connection Manager, which is a potential issue. My database is on a different network to the client too, so again, there could be issues here.

But at least I've narrowed down where I need to look which is very helpful. Thank you.

sme31 commented 2 years ago

As an update, I have now connected to my Database directly, rather than through the Oracle Connection Manager, and I am able to write large blobs in a second or so.

So, it is definitely an issue with the OCM, which I will pass back to the DBAs.

cjbj commented 2 years ago

@sme31 our CMAN developers are interested in following up. Would it be possible to get the cman.ora file and the socket stats on the cman node for the connection from client to cman and cman to the DB https://man7.org/linux/man-pages/man8/ss.8.html ? You can email me (see my profile).

stale[bot] commented 2 years ago

This issue has been automatically marked as inactive because it has not been updated recently. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] commented 2 years ago

This issue has been automatically closed because it has not been updated for a month.