snowflakedb / snowpark-python

Snowflake Snowpark Python API
Apache License 2.0
267 stars 110 forks source link

SNOW-753219: Add support for "to_arrow" and "from_arrow" methods #704

Open tmespe opened 1 year ago

tmespe commented 1 year ago

What is the current behavior?

Currently the only way to "export" a snowpark dataframe is by converting it to a pandas dataframe. While this is great, newer alternatives are starting to gain traction, so a way to export to them to other libraries than pandas would be good.

Currently you can convert to a pandas dataframe, and then into other types, but a more direct way would be better.

There is already support for fetch_arrow_all in snowflake-connector, which I believe snowpark already uses, and I've tested it to work with Polars.from_arrow().

What is the desired behavior?

Expose an arrow representation of the dataframe so that it can be picked up by f.ex Polars.from_arrow().

Inversely it would be great to be able to do the reverse operation using polars.to_arrow().

How would this improve snowflake-snowpark-python?

This would allow snowpark to be more flexible as Arrow is becoming widely supported in the data space. It would also allow basic support for other libraries without having adding direct support for certain libraries.

References, Other Background

mjclarke94 commented 1 year ago

This one is pretty important for us. Loading data in to pandas just to turn it in to polars, and back. Pretty long winded way of saying "grab this data".

Plus, with pandas 2.0 supporting arrow data types there's potentially some easy wins here with regards to weird conversion issues to/from date time types.