Open FlorianWilhelm opened 11 months ago
Just FYI this is the same as: https://github.com/snowflakedb/snowpark-python/issues/704
Thanks @culpgrant, but only in the way that to_arrow
and from_arrow
could be used to easily implement to_polars
and from_polars
. It would be still quite convenient to have them.
True sorry that is my bad :)
They are indeed overlapping but not exactly the same. My main goal was to be able to more easily use polars with snowpark when I created the issue. Hope to see something soon, since polars is amazing.
What is the current behavior?
Currently working with data happens either really fast in Snowflake with the help of Snowpark Dataframes or super slow and on a single core when transformed to Pandas with
to_pandas()
. This can be especially painful for UDFs.What is the desired behavior?
Also support polars dataframes. Polars is blazingly fast, multi-threaded and makes use of all cores on a node. It's also gaining a lot of traction, 20.9k Github stars compared to 40k of Pandas. It's already used used in real-world project and we see dev teams migrating from Pandas to Polars. In the end it would be nice to have a
to_polars()
dataframe method.How would this improve
snowflake-snowpark-python
?It would allow UDFs to be much faster for custom code in cases one would need to refrain to
to_pandas
normally.References, Other Background