sfu-db / connector-x

Fastest library to load data from DB to DataFrames in Rust and Python
https://sfu-db.github.io/connector-x
MIT License
2.02k stars 163 forks source link

SQL Server Datetime read_sql leads to PanicException #634

Closed Jeffrey-Amst closed 5 months ago

Jeffrey-Amst commented 6 months ago

What language are you using?

Python

What version are you using?

Python 3.10.14

What database are you using?

Microsoft SQL Server (ver. 15.00.4360)

What dataframe are you using?

Arrow

Can you describe your bug?

I get an "panicexception: out of range Datetime. This happens since connectorx version 0.3.3. Reverting back to version 0.3.2 resolves the issue.

What are the steps to reproduce the behavior?

Getting a table from SQL Server with the following settings:

cx.read_sql( conn='mssql://etcetera', query='select * FROM [dbo].[transactions]', return_type="arrow2", )

Database setup if the error only happens on specific data or data type

Happens on datetime and datetime2 in SQL Server Table schema and example data

Example query / code

See above

What is the error?

pyo3_runtime.PanicException: out of range DateTime

AK2001 commented 6 months ago

I had the same issue. Downgrading to version 0.3.2 fixed it

My error: thread '<unnamed>' panicked at \connector-x\connector-x\connectorx\src\destinations\arrow2\arrow_assoc.rs:312:36: out of range DateTime

blthree commented 6 months ago

We're seeing the same error on Oracle for both pandas and arrow2 return types. The error doesn't occur for every table with a datetime field (for example "SELECT SYSDATE FROM dual" works fine). Can confirm that the errors don't occur after downgrading to 0.3.2

In [4]: cx.read_sql(dwhcosu_conn, "select * from sysadm.psxlatitem")
thread '<unnamed>' panicked at src\pandas\pandas_columns\datetime.rs:84:44:
out of range DateTime
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
---------------------------------------------------------------------------
PanicException                            Traceback (most recent call last)
Cell In[4], line 1
----> 1 cx.read_sql(dwhcosu_conn, "select * from sysadm.psxlatitem")

File ~\scoop\apps\miniconda3\current\envs\dbt_af39_env\lib\site-packages\connectorx\__init__.py:364, in read_sql(conn, query, return_type, protocol, partition_on, partition_range, partition_num, index_col)
    361 if return_type in {"modin", "dask", "pandas"}:
    362     try_import_module("pandas")
--> 364     result = _read_sql(
    365         conn,
    366         "pandas",
    367         queries=queries,
    368         protocol=protocol,
    369         partition_query=partition_query,
    370     )
    371     df = reconstruct_pandas(result)
    373     if index_col is not None:

PanicException: out of range DateTime

In [5]: cx.read_sql(dwhcosu_conn, "select sysdate from dual")
Out[5]:
              SYSDATE
0 2024-05-24 13:22:33
wangxiaoying commented 6 months ago

Seems like it is related to the update for the new datetime api. Can you provide a minimum reproducible example for this? (Mainly the data insertion for the table creation)

wangxiaoying commented 5 months ago

Can you try out the new alpha version 0.3.4a1 to see whether it fixes the issue?

wangxiaoying commented 5 months ago

I will close it for now but feel free to open it if you find any issue with 0.3.4a1.

rudyryk commented 2 months ago

@wangxiaoying Hello,

this is easily reproducuble on MS SQL, e.g.

CREATE TABLE [tempdb].[testing].[test_date_x] (d DATE);

INSERT INTO [tempdb].[testing].[test_date_x] (d) VALUES ('1601-01-01')

And then:

import connectorx as cx

urlschema = "mssql"
username = "SA"
password = "secret"
host = "127.0.0.1"
port = 1433
database = "tempdb"
dsn = f"{urlschema}://{username}:{password}@{host}:{port}/{database}"
query = "SELECT * FROM [tempdb].[testing].[test_date_x]"
table = cx.read_sql(dsn, query)
print(table)

With dates like 1980-01-01, 2024-01-01 works just fine.

rudyryk commented 2 months ago

@wangxiaoying Using alpha releases 0.3.4a1 .. 0.3.4a2 also does not help.