oracle / python-oracledb

Python driver for Oracle Database conforming to the Python DB API 2.0 specification. This is the renamed, new major release of cx_Oracle
https://oracle.github.io/python-oracledb
Other
307 stars 59 forks source link

Not able to to read '01-JAN-4712 00.00.00' date value #345

Open sam03489 opened 2 weeks ago

sam03489 commented 2 weeks ago
  1. What versions are you using?

oracledb.version: 2.2.1

platform.platform: Linux-4.18.0-513.24.1.el8_9.x86_64-x86_64-with-glibc2.35 sys.maxsize > 2**32: True platform.python_version: 3.10.12

Oracle DB version: Oracle Database 19c Enterprise Edition Release 19.0.0.0.0

  1. Is it an error or a hang or a crash?

it is an error.

Unable to fetch date column with value '01-JAN-4712 00.00.00'

Error I get is ValueError: year 60824 is out of range

SELECT EARLIEST_SHIP_DATE FROM apps.OE_ORDER_LINES_ALL where line_id=1096

If I use the below SQL, it works. SELECT TO_DATE(to_char(EARLIEST_SHIP_DATE, 'YYYY-MM-DD HH24:MI:SS'),'YYYY-MM-DD HH24:MI:SS') FROM apps.OE_ORDER_LINES_ALL where line_id=1096

  1. What error(s) or behavior you are seeing?

Unable to fetch date column with value '01-JAN-4712 00.00.00' Error I get is ValueError: year 60824 is out of range

SELECT EARLIEST_SHIP_DATE FROM apps.OE_ORDER_LINES_ALL where line_id=1096

image

  1. Does your application call init_oracle_client()?

No. I am using Thin mode

  1. Include a runnable Python script that shows the problem.

Code which does not work:-

import oracledb conn = oracledb.connect(**conn_dict) query = """ SELECT EARLIEST_SHIP_DATE FROM apps.OE_ORDER_LINES_ALL where line_id=1096 """ cur = conn.cursor() cur.execute(query) rs = cur.fetchone() print(rs) cur.close() conn.close()

image

Code which works:-

import oracledb conn = oracledb.connect(**conn_dict) query = """ SELECT TO_DATE(to_char(EARLIEST_SHIP_DATE, 'YYYY-MM-DD HH24:MI:SS'),'YYYY-MM-DD HH24:MI:SS') EARLIEST_SHIP_DATE FROM apps.OE_ORDER_LINES_ALL where line_id=1096 """ cur = conn.cursor() cur.execute(query) rs = cur.fetchone() print(rs) cur.close() conn.close()

image

anthony-tuininga commented 2 weeks ago

I strongly suspect this is because of corrupted data stored in your database. Try this:

import oracledb

conn = oracledb.connect("user/password@host:port/service_name")
cursor = conn.cursor()
cursor.execute("""
    select
        to_date('4712-01-01', 'YYYY-MM-DD'),
        dump(to_date('4712-01-01', 'YYYY-MM-DD'))
    from dual""")
for row in cursor:
    print(row)

That works just fine for me and I suspect it will also work just fine for you. This is my output:

(datetime.datetime(4712, 1, 1, 0, 0), 'Typ=13 Len=8: 104,18,1,1,0,0,0,0')

Run dump() on the date in your database. I suspect you will see that it is corrupted.

sam03489 commented 2 weeks ago

It is a production data so do not think data is corrupted. Why will to_date(to_char()) work and without it, it will not.

I added timestamp to your code.

image

sam03489 commented 2 weeks ago

It does look like a bug to me.

anthony-tuininga commented 2 weeks ago

The time portion defaults to 0 so adding that doesn't make any difference. :-) You can see that the output is identical.

Run dump() on the date in your database and see if it matches the information shown for the specific date. Note that some clients are able to insert dates without them being validated. The code that is performing the to_char() is not affected by the invalid data. Correcting the data in your database would be the right course of action -- once you have indeed verified that it is corrupt, of course!

anthony-tuininga commented 2 weeks ago

It does look like a bug to me.

Let's see what is actually stored in the database, first.

sam03489 commented 2 weeks ago

Here you go image

sam03489 commented 2 weeks ago

with cx_Oracle package, I am able to make it work using "outputtypehandler" feature but it does not work in Oracledb. It does not even go to DateTimeConverter function. Please note that we are able to pull the data using informatica power center, which uses odbc driver. I do think it is a BUG.

image

import os
import cx_Oracle
import datetime

def DateTimeConverter(value):
    if value == '01-JAN-12':
        return datetime.datetime.strptime('01-01-4712', '%d-%m-%Y')
    else:
        return value

def OutputHandler(cursor, name, defaulttype, length, precision, scale):
    if defaulttype == cx_Oracle.DATETIME:
        return cursor.var(cx_Oracle.STRING, arraysize=cursor.arraysize, outconverter=DateTimeConverter)

conn.outputtypehandler = OutputHandler
cur = conn.cursor()
sql = """
select
        EARLIEST_SHIP_DATE,
        dump(EARLIEST_SHIP_DATE)
    FROM apps.OE_ORDER_LINES_ALL where line_id=1096
"""
cur.execute(sql)
for row in cur:
    print(row)

print(cx_Oracle.version)
cur.close()
conn.close()
anthony-tuininga commented 2 weeks ago

Ok. I realized that the output didn't quite match, so I did this instead. Create a table and populate it with the hard-coded value:

create table issue_345 (
    date_val date
);

insert into issue_345 values (to_date('4712-01-01', 'YYYY-MM-DD'));

commit;

You can insert your date into that table by selecting it from your production table. Then run this Python code:

import oracledb

conn = oracledb.connect(CONNECT_STRING)
cursor = conn.cursor()
cursor.execute(
    """
    select date_val, dump(date_val)
    from issue_345
    """
)
for row in cursor:
    print(row)

This yields the following output:

(datetime.datetime(4712, 1, 1, 0, 0), 'Typ=12 Len=7: 147,112,1,1,1,1,1')

The first value should be 100 greater than the century (147 - 100 = 47) and the second value should be 100 greater than the year inside the century (112 - 100 - 12). In your date the values are 53 - 100 = -47 and 88 - 100 = -12, which is incorrect. This demonstrates that the value stored in the database has been corrupted. It should be updated to the correct value.

sam03489 commented 2 weeks ago

If the value stored in the database is corrupted then how are other clients working? (Informatica extracts data using it's native ODBC driver). The fact that TO_CHAR(TO_DATE()) is working makes me believe that python code which converts date to varchar itself is not working. When I tested the code with cx_Oracle package I could see the value coming across as "01-01-12" and I think oracledb package is not able to understand it and is throwing the error.

Between data is from production EBS R12 system. I do not think data is corrupted. This issue is specific to python client.

Please see the CX_ORACLE code snippet I shared.

sam03489 commented 2 weeks ago

Our organization has license for Oracle Support. If I create a ticket with Oracle support, will it be the right approach? Data definitely is not corrupted as there are millions of record with this value and the applications are working fine.

anthony-tuininga commented 2 weeks ago

The other clients may have worked around this data corruption and python-oracledb can do the same, if that is the case. I will check internally to see if this is expected or a common corruption which has workarounds enabled and get back to you.

Note that python-oracledb is a replacement for cx_Oracle and if you enable thick mode (by calling oracledb.init_oracle_client()) then you will use the same Oracle Client libraries that were used with cx_Oracle and you will get the same behavior as before.

Just to get some better understanding of the situation. Can you create the table I suggested earlier and populate it with the row I gave as well as the row from your table that isn't being fetched correctly? Then run dump() on both rows and also attempt to fetch from the table using this:

select date_val from issue_345
where date_val = to_date('4712-01-01', 'YYYY-MM-DD');

If you see both rows, please let me know! Thanks.

anthony-tuininga commented 2 weeks ago

Ok, I tried inserting the date as follows:

insert into issue_345 values (to_date('-4712-01-01', 'SYYYY-MM-DD'))

and then I get the error you mentioned. So I will have that corrected!

anthony-tuininga commented 2 weeks ago

I have pushed a patch that corrects this issue. If you are able to build from source you can verify that it works for you, too.

NOTE: you will now get back the correct error: year -4712 is out of range.

anthony-tuininga commented 2 weeks ago

with cx_Oracle package, I am able to make it work using "outputtypehandler" feature but it does not work in Oracledb. It does not even go to DateTimeConverter function.

What do you mean by that? Output type handlers are definitely supported in python-oracledb! Please supply code that demonstrates the issue using the table I supplied above. And please do not supply screenshots!

sam03489 commented 2 weeks ago

with cx_Oracle package, I am able to make it work using "outputtypehandler" feature but it does not work in Oracledb. It does not even go to DateTimeConverter function.

What do you mean by that? Output type handlers are definitely supported in python-oracledb! Please supply code that demonstrates the issue using the table I supplied above. And please do not supply screenshots!

image

I trusts that it works for you, I do not have write access to this Oracle DB to create the table. As this is PROD EBS R12 data, I do not think data is corrupted. There are following reasons:-

  1. DBAs gets alerted if data corruption happens.
  2. This is not the only record. This table is one of the most important tables in EBS and has millions of records with the same date.
  3. If other clients are working and TO_DATE(TO_CHAR()) is working then it suggests the python library needs some work or at least in bare minimum needs to work like CX_ORACLE.

I am not sure if NLS parameters are at play here, which might be causing this.

This most likely is a BUG.

sam03489 commented 2 weeks ago

It does not even go inside DateTimeConverter, I tried with oracledb.STRING too.

cjbj commented 2 weeks ago

@sam03489 Did you test the new patch from today?

I would have quickly tested the handler case, but because you posted a screenshot I couldn't cut and paste :(.

(This is a reminder to all our community not to post screen shots - they make life hard. And are impossible for vision impaired people to see)

anthony-tuininga commented 2 weeks ago

What you may be running into is the fact that thick mode performs the conversion from DATE to STRING internally in the Oracle Client libraries whereas thin mode always converts to a Python date (which will fail because of the inability to represent the date in Python) before running the conversion. We could create an OracleDate Python type which could represent the Oracle DATE fully and would have methods and members to transform to a regular Python date or to a string, etc. If that is of interest you can log an enhancement request and we can consider it further. The other option is, since you can't represent that date in Python anyway, convert it to whatever value you want on the server before fetching it -- possibly by using a view.

sam03489 commented 1 week ago

@sam03489 Did you test the new patch from today?

I would have quickly tested the handler case, but because you posted a screenshot I couldn't cut and paste :(.

(This is a reminder to all our community not to post screen shots - they make life hard. And are impossible for vision impaired people to see)

Here you go:-

import datetime import oracledb def DateTimeConverter(value): print("in DateTimeConverter") return datetime.datetime.strptime('01-01-4712 00:00:00', '%d-%m-%Y %H:%M:%S')

def OutputHandler(cursor, metadata): if metadata.type_code is oracledb.DB_TYPE_VARCHAR: return cursor.var(metadata.type_code, arraysize=cursor.arraysize, encoding_errors="replace") if metadata.type_code is oracledb.DB_TYPE_DATE: print("in output_type_handler") return cursor.var(oracledb.DB_TYPE_VARCHAR, arraysize=cursor.arraysize, outconverter=DateTimeConverter)

conn = oracledb.connect(**conn_dict) conn.outputtypehandler = OutputHandler cur = conn.cursor() sql = """select EARLIEST_SHIP_DATE, dump(EARLIEST_SHIP_DATE) FROM apps.OE_ORDER_LINES_ALL where line_id=1096""" cur.execute(sql)

for row in cur: print(row) cur.close() conn.close()

Similar code was posted few times in this story hence I chose to paste the screen print to show that it is a BUG in the python library.

sam03489 commented 1 week ago

What you may be running into is the fact that thick mode performs the conversion from DATE to STRING internally in the Oracle Client libraries whereas thin mode always converts to a Python date (which will fail because of the inability to represent the date in Python) before running the conversion. We could create an OracleDate Python type which could represent the Oracle DATE fully and would have methods and members to transform to a regular Python date or to a string, etc. If that is of interest you can log an enhancement request and we can consider it further. The other option is, since you can't represent that date in Python anyway, convert it to whatever value you want on the server before fetching it -- possibly by using a view.

I have built a meta-data driven Batch Processing Framework, which captures schema on read and handles schema drift if schema changes. Hence, it will be of a big inconvenience to my developers to convert from"SELECT *" to SELECT TO_DATE(TO_CHAR()). Regarding your suggestion to use a view, I will need to check with EBS team if they can create it. The service account I use has Read Only Permission. As this is happening with PROD data, I will have to make a compelling case for the view creation.

I like your approach of: "We could create an OracleDate Python type which could represent the Oracle DATE fully".

in my opinion, it should be tackled as a BUG fix rather than an enhancement. Either ways, I do not know how to proceed from here. If you can let me know the fastest way to solve this, it will be of a big help.

anthony-tuininga commented 1 week ago

There are a a few fast options available to you:

The use of date -4712-01-01 is unusual, to say the least. From the way you are using it, it appears to be used as some sort of sentinel value, not as a legitimate application value. Since Python doesn't handle this value you are going to have to find some way of eliminating it from your queries or transforming it before querying it. The options above are all reasonable solutions to this and do not require a great deal of effort. Adding OracleDate will take a considerable amount of effort so is unlikely to be usable by you for some time.

sam03489 commented 1 week ago

Data is not stored as -4712-01-01. Below is the output from "Oracle SQL Developer" client tool (Owned by Oracle and uses JAVA). -4712-01-01 is what "oracledb" package is translating it to.

It is disappointing that I am not able to convince that it is a BUG and should be fixed.

image

Yes, I know my options and the best option I think is to fix the library. Till then we will use TO_DATE(TO_CHAR()).

anthony-tuininga commented 1 week ago

According to the dump() command it IS stored as -4712-01-01! You can prove this to yourself by running this SQL in both SQL*Plus and in "Oracle SQL Developer":

select earliest_ship_date, to_char(earliest_ship_date, 'SYYYY-MM-DD'), dump(earliest_ship_date)
from apps.oe_order_line_all where line_id = 1096;

If you remove the 'S' format specifier from the to_char() command you will just see 4712 -- but that does NOT mean that the negative year isn't present! The default date format does not include the 'S' format specifier as it is very uncommon to use negative years and some prefer to use the "BC" date format specifier instead.

sam03489 commented 1 week ago

Thanks for your help. Checking with the source team to evaluate our options.