Open romiof opened 1 year ago
It seems it's forcing the precision to ns
; most databases only support ms
or us
precision though, so this is a bit overkill (and results in your overflow/wraparound) 😅
I was using 0.3.1
and didn't see this behavior, so it is something new with 0.3.2
. It's an issue with arrow2
and polars
output.
This seems similar to the issue Pandas has using nanosecond precision everywhere, which causes us lots of issues with far future dates in the database. Particularly the end dates in type II dimensions.
Note that the new version does fix another bug I got on here to post about, which is that the Oracle DATE
datatype is actually a timestamp but connectorx converts them to date32. 0.3.1
was truncating DATE columns with time components and returning them as a day32 format, which is also wrong. 0.3.2
returns timestamps for Oracle DATE columns, which is right, but does it at the overly granular nanosecond precision which now garbles the dates.
>>> query = """
... select
... date '9999-12-31' as extreme_date,
... date '2023-01-01' as regular_date,
... cast(timestamp '2023-01-01 12:34:56' as date) as date_w_time
... from dual
... """
>>>
>>> connectorx.__version__
'0.3.1'
>>> df = connectorx.read_sql(
... conn=url,
... query=query,
... return_type="arrow2",
... )
>>>
>>> df
pyarrow.Table
EXTREME_DATE: date32[day]
REGULAR_DATE: date32[day]
DATE_W_TIME: date32[day]
----
EXTREME_DATE: [[9999-12-31]]
REGULAR_DATE: [[2023-01-01]]
DATE_W_TIME: [[2023-01-01]]
>>>
>>> query = """
... select
... date '9999-12-31' as extreme_date,
... date '2023-01-01' as regular_date,
... cast(timestamp '2023-01-01 12:34:56' as date) as date_w_time
... from dual
... """
>>>
>>> connectorx.__version__
'0.3.2'
>>> df = connectorx.read_sql(
... conn=url,
... query=query,
... return_type='arrow2',
... )
>>>
>>> df
pyarrow.Table
EXTREME_DATE: timestamp[ns]
REGULAR_DATE: timestamp[ns]
DATE_W_TIME: timestamp[ns]
----
EXTREME_DATE: [[1816-03-29 05:56:08.066277376]]
REGULAR_DATE: [[2023-01-01 00:00:00.000000000]]
DATE_W_TIME: [[2023-01-01 12:34:56.000000000]]
Looking into this a little further, it appears that this was already an issue in earlier versions with timestamp fields - nanosecond precision failing for large/small dates. It was just masked in Oracle in versions prior to 0.3.2
due to the other bug which converted Oracle DATEs (which again are really timestamps) into date32[day] format.
@klaerik this bug appears with any date
or datetime
>= 2262/04/12
I'm using Polars, and I escaped it, by using cx return_type = "arrow"
and them using polars.from_arrow
What language are you using?
Python
What version are you using?
connectorx 0.3.2a5
What database are you using?
Tested with Oracle and SQL Server.
What dataframe are you using?
polars 0.17.12
Can you describe your bug?
Selecting date columns with date in very future, cause inconsistents result. For instance, any date
>= 2262/04/12
can reproduce this bug.What are the steps to reproduce the behavior?
The second and third columns dates blows up. This happens if the return_type =
None or "arrow2" or "polars" or "polars2"
. But withreturn_type = "arrow"
it works correct:I've got this error through Polars, when loading data from a database, with columns with this kind os date.
Testing in SQL Server, with
query = "SELECT CAST('2262-04-12' AS DATETIME2)"
causes the same bug.What is the error?
Very weired date is selected, instead the original date. Maybe some kind of memory overflow??