Open antonymilne opened 10 months ago
Maybe Ibis is a good fit here?
I only reacted with 🚀 to this, but to make my position more clear,
as of 5.15 it supports not just polars but actually any dataframe with a to_pandas method, and as of 5.16 it supports dataframes that follow the dataframe interchange protocol (https://github.com/data-apis/dataframe-api/issues/73)
this is awesome ⭐
but we should work out exactly how much performance improvement polars users would actually get in practice to see what the value of this would be over a simple to_pandas call.
I think it's more of a DX experience, not necessarily performance improvement. If folks are using Polars for whatever reason and then they have to do .to_pandas()
to use Vizro, it feels a bit meh. If Vizro supports Polars natively, it's more pleasant.
Just seen on their LinkedIn:
Check migrated all 100+ of their Airflow DAGs from pandas to Polars and saved 25% in cloud expenses.
Any update about Polars integration? More and more people are using it.
Supporting Narwhals may be a sensible 1st step since we get many birds with one stone... https://github.com/narwhals-dev/narwhals
Would also like to support Ibis for the same reason.
Narwhals looks very interesting, thanks for pointing it out @datajoely.
@reouvenzana no updates on this - the current situation outlined in the first post still applies here. It's something I'd still like to do but it just hasn't been prioritised yet. Your comment here though does help to bump the priority up!
When we do implement this, whatever we do is likely to closely follow plotly's pattern to begin, so there won't be any performance improvements, just a DX improvement as @astrojuanlu suggested above.
@reouvenzana are you interested in using polars in vizro for performance improvements or just for ease of use to avoid doing a to_pandas
call?
@antonymilne
Your comment here though does help to bump the priority up!
Nice!
@reouvenzana are you interested in using polars in vizro for performance improvements or just for ease of use to avoid doing a to_pandas call?
Honestly, it's more about the api / functionalities of Polars (easy method chaining, list columns which are really useful) than performance issues. It's bothersome to have pl.DataFrame
, df.to_pandas()
everywhere in my code, and also to handle the mismatch between data types. I'm aware that Dash / Plotly "supports" polars, though there is no performance gain as you've pointed out. Still, it would be great, given the increasing traction behind Polars.
Got it, thank you @reouvenzana, that makes a lot of sense 👍
For future reference the logic they use to do the conversion to pandas is here:
https://github.com/plotly/plotly.py/blob/51eb5ea9fefda27bccfdb21e660b8d4035cef3b0/packages/python/plotly/plotly/express/_core.py#L1323-L1353. So any pandas>= 2.0.2
will use __dataframe__
rather than to_pandas
. We would probably do something similar to this to begin with anyway.
Originally posted by @vmisusu in https://github.com/mckinsey/vizro/issues/191#issuecomment-1845368168
I'm opening this issue to see whether other people have the same question so we can figure out what priority it should be. Just hit 👍 if it's something you'd like to see in vizro and feel free to leave and comments.
The current situation (25 January 2024) is:
to_pandas
method, and as of 5.16 it supports dataframes that follow the dataframe interchange protocol (which is nowpip install
able)to_pandas
call. The biggest changes we'd need to make would be to actions code like filtering functionality (FYI @petar-qb). I don't think it would be too hard, but it's certainly not a small task eitherSee also How Polars Can Help You Build Fast Dash Apps for Large Datasets
From @Coding-with-Adam:
FYI @astrojuanlu
[^1]: https://github.com/plotly/plotly.py/pull/4244 https://github.com/plotly/plotly.py/pull/4272/files https://github.com/plotly/plotly.py/pull/3901 https://github.com/plotly/plotly.py/issues/3637