Open Dekermanjian opened 5 months ago
I am still having a hard time with this issue. It seems that others are also experiencing this problem. Here is an open issue on delat-rs https://github.com/delta-io/delta-rs/issues/2593. Is this something that would be resolved if the issue with delat-rs is resolved?
When I scan delta lake table, I use a option like that:
import polars as pl
df = pl.scan_delta(
file_path,
pyarrow_options={"parquet_read_options": {"coerce_int96_timestamp_unit":"ms"}}
)
pyarrow_options parameter is used in delta-rs to_pyarrow_dataset, and you can find out how to coerce the timestamp. I hope this helps.
Checks
Reproducible example
I believe this occurs when you save a timestamp variable as a nanosecond unit. PyArrow tries to convert it to us units and throws an exception. I also believe that there are some arguments you can pass to PyArrow to coerce the timestamp. See here https://github.com/apache/arrow/issues/1920
Log output
Issue description
Is there currently a way to get around this issue?
Expected behavior
To be able to pass some argument to pyarrow to coerce the timestamp field.
Installed versions