Closed jonashaag closed 7 months ago
Btw, here https://github.com/pacman82/arrow-odbc-py/issues/47#issuecomment-1661655693 you @pacman82 suggested to use odbc2parquet if you want to go from ODBC to Parquet without the Arrow intermediary. To me this implies that it's also faster than going through arrow-odbc-py. In my benchmarks with comparable settings however, odbc2parquet is ~ 20% slower than going through arrow-odbc-py with fetch_concurrently()
+ pyarrow.parquet
.
Update: Deleted invalid profiling results.
On another query arrow-odbc is much faster. Interesting.
Faster is always better, but I am closing this issue. Not sure what the definition of done here would be.
In my benchmarks, it seems like Turbodbc with
use_async_io=True
is 20–30% faster than arrow-odbc-py withfetch_concurrently()
.I haven't done any profiling on this yet.