oracle / python-oracledb

Python driver for Oracle Database conforming to the Python DB API 2.0 specification. This is the renamed, new major release of cx_Oracle
https://oracle.github.io/python-oracledb
Other
308 stars 61 forks source link

Unable To SELECT Dates In The Year 0001 #215

Closed WayneLambert closed 10 months ago

WayneLambert commented 10 months ago

1. What versions are you using?

The database version is Oracle Database 19c Enterprise Edition Release 19.0.0.0.0

>>> import sys
>>> import platform
>>> print("platform.platform:", platform.platform())
platform.platform: Linux-5.15.0-78-generic-x86_64-with-glibc2.2.5
>>> print("sys.maxsize > 2**32:", sys.maxsize > 2**32)
sys.maxsize > 2**32: True
>>> print("platform.python_version:", platform.python_version())
platform.python_version: 3.8.10
>>> import oracledb
>>> print("oracledb.__version__:", oracledb.__version__)
oracledb.__version__: 1.3.2

I have checked the change log for python-dboracle package for the release on PyPI on 16-Aug-23 and I do not see anything within it to suggest that the new version resolves the issue.

2. Is it an error or a hang or a crash?

From what I can ascertain, an error parsing dates in the year 0001 (i.e. the first year of the Gregorian calendar). It might be worth checking on dates in the first 10 years of the Gregorian calendar though.

3. What error(s) or behavior you are seeing?

If I change the SELECT * to just select non-date fields (or at least date fields that don't have a problematic date), it will return a cursor object as expected.

>>> sql = "SELECT MY_PRIMARY_KEY_ID, MY_WORKING_TEXT_FIELD FROM FROM MY_TABLE WHERE MY_PRIMARY_KEY_ID = '1234567890'"
>>> with conn.cursor() as cursor:
...     cursor.execute(sql)
... 
<oracledb.Cursor on <oracledb.Connection to my_username@(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=my_host)(PORT=my_port))(CONNECT_DATA=(SERVICE_NAME=my_service)))>>

If I select date fields in the SELECT clause where the offending date is in the year 0001, it fails with the below traceback.

>>> sql = "SELECT MY_PRIMARY_KEY_ID, MY_WORKING_TEXT_FIELD, MY_PROBLEM_DATE_FIELD FROM MY_TABLE WHERE MY_PRIMARY_KEY_ID = '1234567890'"
>>> with conn.cursor() as cursor:
...     cursor.execute(sql)
Traceback (most recent call last):
  File "<console>", line 2, in <module>
  File "/usr/local/lib/python3.8/site-packages/oracledb/cursor.py", line 378, in execute
    impl.execute(self)
  File "src/oracledb/impl/thin/cursor.pyx", line 138, in oracledb.thin_impl.ThinCursorImpl.execute
  File "src/oracledb/impl/thin/protocol.pyx", line 385, in oracledb.thin_impl.Protocol._process_single_message
  File "src/oracledb/impl/thin/protocol.pyx", line 386, in oracledb.thin_impl.Protocol._process_single_message
  File "src/oracledb/impl/thin/protocol.pyx", line 346, in oracledb.thin_impl.Protocol._process_message
  File "src/oracledb/impl/thin/protocol.pyx", line 325, in oracledb.thin_impl.Protocol._process_message
  File "src/oracledb/impl/thin/messages.pyx", line 291, in oracledb.thin_impl.Message.process
  File "src/oracledb/impl/thin/messages.pyx", line 817, in oracledb.thin_impl.MessageWithData._process_message
  File "src/oracledb/impl/thin/messages.pyx", line 897, in oracledb.thin_impl.MessageWithData._process_row_data
  File "src/oracledb/impl/thin/messages.pyx", line 557, in oracledb.thin_impl.MessageWithData._process_column_data
  File "src/oracledb/impl/thin/buffer.pyx", line 587, in oracledb.thin_impl.Buffer.read_date
  File "src/oracledb/impl/thin/buffer.pyx", line 359, in oracledb.thin_impl.Buffer.parse_date
  File ".eggs/Cython-0.29.35-py3.11-linux-x86_64.egg/Cython/Includes/cpython/datetime.pxd", line 120, in cpython.datetime.datetime_new
ValueError: year 0 is out of range

4. Does your application call init_oracle_client()?

No. I am using Thin mode

5. Include a runnable Python script that shows the problem.

I have shown this behaviour within an interpreter session above. You will be able to simulate this by inserting those dates into some date field and trying to SELECT from it.

anthony-tuininga commented 10 months ago

Can you supply a simple create table and insert statement that creates the problematic dates?

WayneLambert commented 10 months ago

OK, bear in mind that I haven't tested these on an Oracle database. Without jumping through hoops, I only have read-only access to Oracle databases. The SQL should be something like...

CREATE TABLE problematic_dates_issue (
    id NUMBER,
    good_date DATE,
    bad_date DATE
);
INSERT INTO problematic_dates_issue (id, good_date, bad_date)
VALUES
    (1, TO_DATE('2023-08-24 10:30:00', 'YYYY-MM-DD HH24:MI:SS'), TO_DATE('0001-01-01 00:00:00', 'YYYY-MM-DD HH24:MI:SS')),
    (2, TO_DATE('2023-08-23 00:00:00', 'YYYY-MM-DD HH24:MI:SS'), TO_DATE('0001-12-08 00:00:00', 'YYYY-MM-DD HH24:MI:SS'));

The two dates I have provided as bad dates I have specifically encountered issues with. I would expand upon that though to cover off more years near the start of the Gregorian calendar.

From then on, create a connection using oracledb and then try to execute a SELECT statement against those rows.

Normal Functioning Example

Assuming you have a connection object established, my_conn_obj:

sql = "SELECT number, good_date FROM problematic_dates_issue"
with my_conn_obj.cursor() as cursor:
    cursor.execute(sql)

should return with a cursor object which you can fetch from, etc.

Erroring Example

sql = "SELECT number, bad_date FROM problematic_dates_issue"
with my_conn_obj.cursor() as cursor:
    cursor.execute(sql)

should return the reported traceback.

anthony-tuininga commented 10 months ago

I tried your example and didn't run into any difficulty. The dates you created, however, have year 1, not year 0 (which is forbidden in Python as MINYEAR is set to 1). I ran this query:

select bad_date, dump(bad_date) from problematic_dates_issue;

and I got the following output:

01-JAN-01
Typ=12 Len=7: 100,101,1,1,1,1,1

08-DEC-01
Typ=12 Len=7: 100,101,12,8,1,1,1

Can you run the same query on your data? That will help determine if you managed to get corrupted data in your database. If you try for year zero in the database you get this:

ORA-01841: (full) year must be between -4713 and +9999, and not be 0
WayneLambert commented 10 months ago

Obviously I am protecting the actual data here (using aliases) with generic field and table names but using your SQL which you asked me to run, SELECT MY_DATE, DUMP(MY_DATE) FROM MY_TABLE WHERE MY_DATE <= TO_DATE('0001-12-31', 'YYYY-MM-DD');, I get the following in dBeaver:

image

I note in your output, you have got 101 at the second value in that comma separated list. In mine, it is 100. What does this mean? Does this indicate "corrupted data" as in my naivete, the dates appear to be valid dates as they both fall within the first ano domini year in the Gregorian calendar. They, at least on the surface, appear to be both valid Oracle and Python dates. Judging by the 100 you're getting and the 101 I'm getting, then perhaps not?

As previously described, when I do a SELECT statement against either of those two in python-dboracle using the REPL for just one row at a time and for just the one field, it gives me that ValueError: year 0 is out of range message with the traceback.

anthony-tuininga commented 10 months ago

Based on the fact that you have value 100 (year 0) but the Oracle Database doesn't technically allow such dates (see ORA-01841 noted above) I do not think this is a bug in the code but bad data in the database. You will need to correct this bad data (or manipulate it if you are unable to correct it).

WayneLambert commented 10 months ago

@anthony-tuininga Thanks for taking the time to reply. Yes, we are in the process of correcting this data following research from your previous responses. I have learned a lot about Oracle's internal representation of dates over the last few days.

Yes, I agree that this is not a bug from either oracledb or from within my own project code, so you can close the issue.

I have a development Oracle database and I would love to be able to insert a case of those malformed dates into the database. Of course, the database is doing its job and not allowing me to enter the bad dates. I don't suppose you happen to know a way how to purposefully INSERT, UPDATE or load a malformed data case into the DB? I'd like to be able to bypass the usual DB data integrity checks and enter it in there any way. I have looked around for a solution but it doesn't seem to be a simple thing to achieve.

It also makes me wonder how it got in there in the first place.

anthony-tuininga commented 10 months ago

Thanks for the followup. I will close this now. As for getting the data there in the first place: if you have a C program that uses the C APIs you should be able to put anything you like in there -- or at least you could a couple of decades ago as I figured that out when there was a bug in cx_Oracle at the time. :-) You might be able to hack python-oracledb to deliberately zero out the year?

WayneLambert commented 10 months ago

@anthony-tuininga Just to let you (and any future people with the same issue) know that I managed to figure out how to get the malformed dates in the database. In my case, I wanted a malformed '0001-01-01' as a sample case for development to simulate a real-world case we have.

Step 1: Create the function in your database where you wish to replicate the issue. It directly creates the binary DATE value.

CREATE FUNCTION createDate(
  year   int,
  month  int,
  day    int,
  hour   int,
  minute int,
  second int
) RETURN DATE DETERMINISTIC
IS
  hex CHAR(14);
  d DATE;
BEGIN
 hex := TO_CHAR( FLOOR( year / 100 ) + 100, 'fm0X' )
      || TO_CHAR( MOD( year, 100 ) + 100, 'fm0X' )
      || TO_CHAR( month, 'fm0X' )
      || TO_CHAR( day, 'fm0X' )
      || TO_CHAR( hour + 1, 'fm0X' )
      || TO_CHAR( minute + 1, 'fm0X' )
      || TO_CHAR( second + 1, 'fm0X' );
  DBMS_OUTPUT.PUT_LINE( hex );
  DBMS_STATS.CONVERT_RAW_VALUE( HEXTORAW( hex ), d );
  RETURN d;
END;
/

Step 2: Then INSERT the values into a new row (like below). The first value in the createDate function is the erroneous (non-existent) zeroth year that we're after. The function does not have a century component like the DUMP output.

INSERT INTO MYSCHEMA.MYTABLE (MY_ID, MY_DATE) VALUES ('123456789A', createDate(0,1,1,0,0,0));

Step 3: Check what has been INSERTed:

SELECT MY_ID, MY_DATE, DUMP(MY_DATE) FROM MYSCHEMA.MYTABLE
WHERE MY_ID = '123456789A';
image

Other Notes

I also checked using the function to UPDATE an existing record and it also works as you would expect.

I guess the same principle of using a similarly structured function can be extended to other use cases where you would want to insert malformed data purposefully into the database bypassing ordinary integrity checks.

Sources

Thanks to this Stack Overflow post: https://stackoverflow.com/questions/71455703/oracle-19c-zero-hour-default-date-format-with-time-without-using-any-formating-f

anthony-tuininga commented 10 months ago

Thanks, @WayneLambert! That was helpful!