ydataai / ydata-profiling

1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
https://docs.profiling.ydata.ai
MIT License
12.47k stars 1.68k forks source link

Feature Request: support for Polars #1129

Open PierreSnell opened 1 year ago

PierreSnell commented 1 year ago

Missing functionality

Polars integration ? https://www.pola.rs/

Proposed feature

Use polars dataframe as a compute backend. Or let the user give a polars dataframe to the ProfileReport.

Alternatives considered

Spark integration.

Additional context

Polars help to optimize queries and reduce memory footprint. It could be used to do analysis on big dataframe and speed up computation ?

fabclmnt commented 1 year ago

Hi @PierreSnell , we have the spark integration already ongoing so I'll be linking this Issue to that instead #543. Thank you for your suggestion

leandro-ferreira-farm commented 1 year ago

Support to Polars could be wonderful. Polars had great growth in the last years and many data scientist and data analysts using daily. In this road map support Polars?

fabclmnt commented 1 year ago

Hi @leandro-ferreira-farm , no not yet.

We haven't yet found that much of an interest among the community to add it to the roadmap. But let us know if you are interested in contributing for Polars support we would be happy to guide you through the initiative.

leandro-ferreira-farm commented 1 year ago

Hello @fabclmnt, I have interesting in contributing with the project and add Polars support, please tell how can I do this.

naisofly commented 1 year ago

PLEASE add polars support 🙌🙌🙌

bvolodarskiy commented 1 year ago

agree

kyle-gilde commented 1 month ago

yes, it would be great to see Polars support. As of 2024, I think that Polars is gaining a lot of ground in the non-Spark, big-data DataFrame competition.

Filimoa commented 1 day ago

Polars support would be huge - it's currently sitting at 400k weekly downloads vs pandas 10M. Personally, I suspect that share is significantly higher on new projects which is where ydata-profiling is probably most used.

You can do a naive approach of just converting a polars dataframe to pandas but you're missing out on all the performance gains which are massive.

profile = ProfileReport(df.to_pandas(), title="Profiling Report")
e10v commented 1 day ago

There is an option to use Polars API with all dataframes supported by Narwhals (including Pandas). Altair switched to Narwhals and now supports both Pandas and Polars using the same code.