oracle / python-oracledb

Python driver for Oracle Database conforming to the Python DB API 2.0 specification. This is the renamed, new major release of cx_Oracle
https://oracle.github.io/python-oracledb
Other
328 stars 66 forks source link

problem with bind variables #187

Closed jaroslawjarek closed 1 year ago

jaroslawjarek commented 1 year ago

I am running on: oracledb.version -> 1.3.1 oracle client: instantclient-basic-linux.x64-21.9.0.0.0dbru.zip os: Red Hat Enterprise Linux"VERSION="8.7 (Ootpa) source: oracle: Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production Version 19.10.0.0.0 destination: Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production Version 19.16.0.0.0

What is my case: 1) I am connecting to the source system, which is source of my large set of data, 2) I am executing my extraction query but just to receive metadata - it is the same query as for extraction but with enclosed with upper query with condition 1=0, 3) based on this query I am preparing list of source column names and datatypes of these columns e.g. v_records_column_names = tuple([col[0] for col in v_source_meta_cursor.description]) v_records_column_types = tuple([col[1] for col in v_source_meta_cursor.description])

4)Based on my column names I am query empty destination table to receive destination column datatypes - the same way as in point2.

5)Based on source query column names I am preparing insert statement with bind variables e.g.: INSERT INTO SDH_STAGING.B16735660_MATERIAL (INVENTORY_ITEM_ID,SEGMENT1,CREATION_DATE,CREATED_BY,ITEM_CATALOG_GROUP_ID,ATTRIBUTE16,INVENTORY_ITEM_STATUS_CODE,DESCRIPTION,ITEM_TYPE,PRIMARY_UOM_CODE,DIMENSION_UOM_CODE,UNIT_HEIGHT,UNIT_LENGTH,UNIT_VOLUME,UNIT_WEIGHT,UNIT_WIDTH,VOLUME_UOM_CODE,WEIGHT_UOM_CODE,ATTRIBUTE3,LAST_UPDATE_DATE,CUSTOMER_ORDER_ENABLED_FLAG,FULL_LEAD_TIME,SAFETY_STOCK_BUCKET_DAYS,PREPROCESSING_LEAD_TIME,POSTPROCESSING_LEAD_TIME,MINIMUM_ORDER_QUANTITY,PRODUCT_FAMILY_ITEM_ID,BRAND,STARS_CODE,CATEGORY_SET_ID,LOCAL_HTTS_CODE,SEGMENT5,GDP,GLOBAL_ATTRIBUTE3) values (:INVENTORY_ITEM_ID,:SEGMENT1,:CREATION_DATE,:CREATED_BY,:ITEM_CATALOG_GROUP_ID,:ATTRIBUTE16,:INVENTORY_ITEM_STATUS_CODE,:DESCRIPTION,:ITEM_TYPE,:PRIMARY_UOM_CODE,:DIMENSION_UOM_CODE,:UNIT_HEIGHT,:UNIT_LENGTH,:UNIT_VOLUME,:UNIT_WEIGHT,:UNIT_WIDTH,:VOLUME_UOM_CODE,:WEIGHT_UOM_CODE,:ATTRIBUTE3,:LAST_UPDATE_DATE,:CUSTOMER_ORDER_ENABLED_FLAG,:FULL_LEAD_TIME,:SAFETY_STOCK_BUCKET_DAYS,:PREPROCESSING_LEAD_TIME,:POSTPROCESSING_LEAD_TIME,:MINIMUM_ORDER_QUANTITY,:PRODUCT_FAMILY_ITEM_ID,:BRAND,:STARS_CODE,:CATEGORY_SET_ID,:LOCAL_HTTS_CODE,:SEGMENT5,:GDP,:GLOBAL_ATTRIBUTE3

6) Using connection and cursor to source system I am extracting data in chunk in the following way:

v_source_cursor=self.source_db_connection.cursor('server_side_cursor')
v_source_cursor.itersize = 2*v_fetch_size
#v_tsql - extraction query
v_source_cursor.execute(v_tsql)

v_total_cnt:int=0
logger.info(f"Extraction from source {self.source_system_id} has started")
#fetch data and insert data in loops
v_sdh_cursor_open=True
while True:

    v_tsql_record_set = v_source_cursor.fetchmany(v_fetch_size)

    if  len(v_tsql_record_set)==0:
        break

    else:

        # we take a new cursor before we insert new chunk of data
        v_total_cnt+=len(v_tsql_record_set)
        logger.debug(f"v_total_cnt: {v_total_cnt}")
        #v_sdh_cursor=self.sdh_connection.cursor()
        v_sdh_cursor.executemany(v_sql_staging_table_insert_template,v_tsql_record_set)
        #v_sdh_cursor.close()
        #v_sdh_cursor_open=False
        self.sdh_connection.commit()

7) Randomly for different records I was receiving this message:

v_sdh_cursor.executemany(v_sql_staging_table_insert_template,v_tsql_record_set)

File "/home/py_elt/.local/lib/python3.8/site-packages/oracledb/cursor.py", line 438, in executemany self._impl.bind_many(self, parameters) File "src/oracledb/impl/base/cursor.pyx", line 339, in oracledb.base_impl.BaseCursorImpl.bind_many File "src/oracledb/impl/base/cursor.pyx", line 61, in oracledb.base_impl.BaseCursorImpl._bind_values File "src/oracledb/impl/base/cursor.pyx", line 122, in oracledb.base_impl.BaseCursorImpl._bind_values_by_position File "src/oracledb/impl/base/bind_var.pyx", line 129, in oracledb.base_impl.BindVar._set_by_value File "src/oracledb/impl/base/var.pyx", line 86, in oracledb.base_impl.BaseVarImpl._check_and_set_value File "src/oracledb/impl/base/var.pyx", line 59, in oracledb.base_impl.BaseVarImpl._check_and_set_scalar_value File "src/oracledb/impl/base/connection.pyx", line 147, in oracledb.base_impl.BaseConnImpl._check_value File "/home/py_elt/.local/lib/python3.8/site-packages/oracledb/errors.py", line 118, in _raise_err raise exc_type(_Error(message)) from cause oracledb.exceptions.NotSupportedError: DPY-3013: unsupported Python type int for database type DB_TYPE_VARCHAR

If I took just the records, which made a problem - I started to process in chunks of two records - I was able to insert them into db. I used such code:

v_insert_stm="""INSERT INTO SDH_STAGING.B16735660_MATERIAL (INVENTORY_ITEM_ID,SEGMENT1,CREATION_DATE,CREATED_BY,ITEM_CATALOG_GROUP_ID,ATTRIBUTE16,INVENTORY_ITEM_STATUS_CODE,DESCRIPTION,ITEM_TYPE,PRIMARY_UOM_CODE,DIMENSION_UOM_CODE,UNIT_HEIGHT,UNIT_LENGTH,UNIT_VOLUME,UNIT_WEIGHT,UNIT_WIDTH,VOLUME_UOM_CODE,WEIGHT_UOM_CODE,ATTRIBUTE3,LAST_UPDATE_DATE,CUSTOMER_ORDER_ENABLED_FLAG,FULL_LEAD_TIME,SAFETY_STOCK_BUCKET_DAYS,PREPROCESSING_LEAD_TIME,POSTPROCESSING_LEAD_TIME,MINIMUM_ORDER_QUANTITY,PRODUCT_FAMILY_ITEM_ID,BRAND,STARS_CODE,CATEGORY_SET_ID,LOCAL_HTTS_CODE,SEGMENT5,GDP,GLOBAL_ATTRIBUTE3) values (:INVENTORY_ITEM_ID,:SEGMENT1,:CREATION_DATE,:CREATED_BY,:ITEM_CATALOG_GROUP_ID,:ATTRIBUTE16,:INVENTORY_ITEM_STATUS_CODE,:DESCRIPTION,:ITEM_TYPE,:PRIMARY_UOM_CODE,:DIMENSION_UOM_CODE,:UNIT_HEIGHT,:UNIT_LENGTH,:UNIT_VOLUME,:UNIT_WEIGHT,:UNIT_WIDTH,:VOLUME_UOM_CODE,:WEIGHT_UOM_CODE,:ATTRIBUTE3,:LAST_UPDATE_DATE,:CUSTOMER_ORDER_ENABLED_FLAG,:FULL_LEAD_TIME,:SAFETY_STOCK_BUCKET_DAYS,:PREPROCESSING_LEAD_TIME,:POSTPROCESSING_LEAD_TIME,:MINIMUM_ORDER_QUANTITY,:PRODUCT_FAMILY_ITEM_ID,:BRAND,:STARS_CODE,:CATEGORY_SET_ID,:LOCAL_HTTS_CODE,:SEGMENT5,:GDP,:GLOBAL_ATTRIBUTE3)""" v_test_data=[(77440536, 'QWPRJ-QMCA32919-00', datetime.datetime(2023, 5, 2, 14, 10, 13), '227173', None, 'QWPRJ', 'Relsd SKU', 'PRODUCT PROFESSIONAL SERVICES OTHER APC CONTINGENCY FUNDS', 'ETO', 'EA', None, None, None, None, None, None, None, None, None, datetime.datetime(2023, 5, 2, 14, 10, 13), 'Y', None, None, None, 0, None, None, 'THIRD PARTY PROCUREMENT', 'NA', 3, None, '501', None, None), (763774, 'ACECCP100', datetime.datetime(2010, 9, 15, 14, 29, 52), '32740', None, None, 'Inactive', 'EcoBreeze Remote Display', 'FG', 'EA', 'IN', 15, 9.0157, 1.0171, 13.002, 12.9921, 'FT3', 'LBS', '731304283973', datetime.datetime(2023, 5, 4, 17, 21, 31), 'N', None, None, None, 0, None, None, 'ECOBREEZE', 'NA', 3, None, '337', '16S4', 'PIM')]

v_sdh_cursor.executemany(v_insert_stm,v_test_data)

At the end I decided to close cursor after processing each chunk of data - than issue has gone. However, as we speak about production system, I would like to receive some help on the issue and what can be potential root cause? I am sure order of columns between select and insert statement is correct. Also datatypes between databases match. Python datatypes match as well.

I found in such issue: https://github.com/oracle/python-cx_Oracle/issues/586

and in my case columns sometimes have values and sometimes there are NULL.

anthony-tuininga commented 1 year ago

I suspect your problem is that in the first chunk of data all of the data (in one column) is NULL and so python-oracledb simply assumes that you are inserting data of type DB_TYPE_VARCHAR with a length of 1. In your next chunk of data you have some numbers (or dates or strings longer than 1 byte in length) and then you get an error. The solution is to use cursor.setinputsizes() to tell the driver up front what type of data you intend to insert -- that way it won't make a bad guess based on input data.

jaroslawjarek commented 1 year ago

I did it but it was exactly the same behavior. I did it in the following way:

db_types = (d[1] for d in db1_cursor.description) db2_cursor.setinputsizes(*db_types) I used by destination table as source for the operation as destination has margin of columns length.

As I said, only change cursor after each chunk (4000 records) helped.

anthony-tuininga commented 1 year ago

That way doesn't cover all scenarios. The method shown in a tool I built a while back called CopyData is what you need to do if you want to be certain of success. Switching the cursor after each chunk will also "solve" the problem -- at the cost of having to determine the types and sizes of the data for each chunk.

anthony-tuininga commented 1 year ago

Closing - no activity.