pacman82 / arrow-odbc-py

Read Apache Arrow batches from ODBC data sources in Python
MIT License
60 stars 5 forks source link

Oracle: Timestamp not extracted correctly #107

Closed jayctran closed 1 month ago

jayctran commented 1 month ago

Thanks for all your efforts on maintaining this library.

I've pulled out the relevant portions of the logs:

image

image

First record arrow-odbc vs pyodbc: image

image

pyodbc did truncate to 6 digits but microseconds is acceptable for our use case.

pacman82 commented 1 month ago

Hello @jayctran ,

thanks for providing the output. This is enough information to figure out what is happening. arrow-odbc uses the precision reported by the driver to figure out if the timestamp should be considered of nano-, micro-, milliseconds precision or even with out any fraction seconds.

You can find the logic here: https://github.com/pacman82/arrow-odbc/blob/2b4e2a33a8c7615feb75045e6831d549b18fbce2/src/schema.rs#L86

I do not know what pyodbc does, but I assume it just hard codes microseconds precision.

As you can see from the printed debug output the Oracle ODBC driver reports only a precision of 0. Given the relational type you specified this should be 0. I would argue this is a bug with the Oracle Driver. In case you would like to report this bug upstream the fault is likely within the implementation of SQLDescribeCol.

Luckily for you arrow-odbc offers a workaround. You can either manually specify an arrow schema (using the aptly named schema parameter). In case you need this to work with an unknown schema you can also supply a callable via the map_schema parameter. In this callable you could map every field type to its identity except the timestamps of seconds precision to those of microseconds. You can also get even better precision than pyodbc by mapping to nanoseconds. Likely only the first seven digits will be different from zero though, but this is a tale for another day ...

Best, Markus

pacman82 commented 1 month ago

Here is link to a test demonstrating one possible use of map schema for a different workaround:

https://github.com/pacman82/arrow-odbc-py/blob/372ba4bbfb3a619d4f0e199484e5387d9ed3e06c/tests/test_arrow_odbc.py#L651

jayctran commented 1 month ago

Hi Markus,

Thanks for the detailed response! That is brilliant and it's exactly what I was looking for!

I'm using polars so will test to see if the options passthrough will work with map_schema.

Thanks again! James