Closed jayctran closed 1 month ago
Hello @jayctran ,
thanks for providing the output. This is enough information to figure out what is happening. arrow-odbc
uses the precision reported by the driver to figure out if the timestamp should be considered of nano-, micro-, milliseconds precision or even with out any fraction seconds.
You can find the logic here: https://github.com/pacman82/arrow-odbc/blob/2b4e2a33a8c7615feb75045e6831d549b18fbce2/src/schema.rs#L86
I do not know what pyodbc does, but I assume it just hard codes microseconds precision.
As you can see from the printed debug output the Oracle ODBC driver reports only a precision of 0
. Given the relational type you specified this should be 0
. I would argue this is a bug with the Oracle Driver. In case you would like to report this bug upstream the fault is likely within the implementation of SQLDescribeCol
.
Luckily for you arrow-odbc
offers a workaround. You can either manually specify an arrow schema (using the aptly named schema
parameter). In case you need this to work with an unknown schema you can also supply a callable via the map_schema
parameter. In this callable you could map every field type to its identity except the timestamps of seconds precision to those of microseconds. You can also get even better precision than pyodbc by mapping to nanoseconds. Likely only the first seven digits will be different from zero though, but this is a tale for another day ...
Best, Markus
Here is link to a test demonstrating one possible use of map schema for a different workaround:
Hi Markus,
Thanks for the detailed response! That is brilliant and it's exactly what I was looking for!
I'm using polars so will test to see if the options passthrough will work with map_schema.
Thanks again! James
Thanks for all your efforts on maintaining this library.
I've pulled out the relevant portions of the logs:
First record arrow-odbc vs pyodbc:
pyodbc did truncate to 6 digits but microseconds is acceptable for our use case.