pydiverse / pydiverse.pipedag

A data pipeline orchestration library for rapid iterative development with automatic cache invalidation allowing users to focus writing their tasks in pandas, polars, sqlalchemy, ibis, and alike.
https://pydiversepipedag.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
19 stars 3 forks source link

Implementation of snowflake backend. #186

Closed windiana42 closed 5 months ago

windiana42 commented 5 months ago

Checklist

windiana42 commented 5 months ago

Snowflake support doesn't seem to be that hard for pipedag. However, for small fast tests Snowflake seems to be awefully slow: image

windiana42 commented 5 months ago

This PR is also about fixing compatibility issues with pandas < 2 and sqlalchemy < 2 compatibility. Especially pyarrow handling was tricky. I opted for preferring StringDtype("pyarrow") over ArrowDtype(pa.string()). Future will tell how good this choice was. There is no problem to revise it. But it must be consistent in a few places.