Open PierreSnell opened 2 years ago
Hi @PierreSnell , we have the spark integration already ongoing so I'll be linking this Issue to that instead #543. Thank you for your suggestion
Support to Polars could be wonderful. Polars had great growth in the last years and many data scientist and data analysts using daily. In this road map support Polars?
Hi @leandro-ferreira-farm , no not yet.
We haven't yet found that much of an interest among the community to add it to the roadmap. But let us know if you are interested in contributing for Polars support we would be happy to guide you through the initiative.
Hello @fabclmnt, I have interesting in contributing with the project and add Polars support, please tell how can I do this.
PLEASE add polars support 🙌🙌🙌
agree
yes, it would be great to see Polars support. As of 2024, I think that Polars is gaining a lot of ground in the non-Spark, big-data DataFrame competition.
Polars support would be huge - it's currently sitting at 400k weekly downloads vs pandas 10M. Personally, I suspect that share is significantly higher on new projects which is where ydata-profiling is probably most used.
You can do a naive approach of just converting a polars dataframe to pandas but you're missing out on all the performance gains which are massive.
profile = ProfileReport(df.to_pandas(), title="Profiling Report")
There is an option to use Polars API with all dataframes supported by Narwhals (including Pandas). Altair switched to Narwhals and now supports both Pandas and Polars using the same code.
I would also love support for Polars DataFrames using it as backend :)
Missing functionality
Polars integration ? https://www.pola.rs/
Proposed feature
Use polars dataframe as a compute backend. Or let the user give a polars dataframe to the ProfileReport.
Alternatives considered
Spark integration.
Additional context
Polars help to optimize queries and reduce memory footprint. It could be used to do analysis on big dataframe and speed up computation ?