snowflakedb / snowpark-python

Snowflake Snowpark Python API
Apache License 2.0
263 stars 108 forks source link

SNOW-944048: Support Polars with a to_polars method #1092

Open FlorianWilhelm opened 11 months ago

FlorianWilhelm commented 11 months ago

What is the current behavior?

Currently working with data happens either really fast in Snowflake with the help of Snowpark Dataframes or super slow and on a single core when transformed to Pandas with to_pandas(). This can be especially painful for UDFs.

What is the desired behavior?

Also support polars dataframes. Polars is blazingly fast, multi-threaded and makes use of all cores on a node. It's also gaining a lot of traction, 20.9k Github stars compared to 40k of Pandas. It's already used used in real-world project and we see dev teams migrating from Pandas to Polars. In the end it would be nice to have a to_polars() dataframe method.

How would this improve snowflake-snowpark-python?

It would allow UDFs to be much faster for custom code in cases one would need to refrain to to_pandas normally.

References, Other Background

culpgrant commented 11 months ago

Just FYI this is the same as: https://github.com/snowflakedb/snowpark-python/issues/704

FlorianWilhelm commented 11 months ago

Thanks @culpgrant, but only in the way that to_arrow and from_arrow could be used to easily implement to_polars and from_polars. It would be still quite convenient to have them.

culpgrant commented 11 months ago

True sorry that is my bad :)

tmespe commented 11 months ago

They are indeed overlapping but not exactly the same. My main goal was to be able to more easily use polars with snowpark when I created the issue. Hope to see something soon, since polars is amazing.