ydataai / ydata-profiling

1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
https://docs.profiling.ydata.ai
MIT License
12.41k stars 1.67k forks source link

Feature Request: Support for modin framework to make the EDA on larger datasets much faster #1290

Open danishbansal808 opened 1 year ago

danishbansal808 commented 1 year ago

Missing functionality

Support for modin framework to make the EDA on larger datasets much faster

I would like to request the addition of support for the Modin framework to our EDA tools. As datasets become larger and more complex, performing EDA using traditional tools like pandas can become challenging due to limitations in memory and processing power. Modin is designed to provide efficient and scalable data processing capabilities for large datasets by using distributed computing techniques to perform operations in parallel. This results in a significant reduction in computation time, enabling data analysts to analyze and visualize large datasets more quickly and efficiently, leading to faster insights and decision-making.

Additionally, Modin offers a seamless interface built on top of pandas, which allows users to leverage the full power of distributed computing without needing to learn new syntax or concepts. This makes it an accessible and user-friendly solution for data analysts and scientists who want to work with large datasets without needing to learn new tools or techniques. With Modin, users can simply install the framework and begin using it immediately with their existing pandas code.

By incorporating Modin into our EDA tools, we can significantly improve the speed and accuracy of data analysis, leading to better insights and decision-making. Therefore, we request the addition of support for the Modin framework in our EDA tools to help us handle large datasets more efficiently.

Proposed feature

Basically, the User does not seem to know how the inners are working by simply using pandas-profiling in backend if data size is larger simply use modin.pandas instead of pandas.

Alternatives considered

No response

Additional context

Modin

fabclmnt commented 1 year ago

Hi @danishbansal808 ,

thank you for the detailed request! At this moment we don't have Modin in our roadmap. There is not yet a lot of request from the community for the support of the framework.

If more users are interested on this and the feature is up voted we will consider it for the roadmap.

HimanshuS01 commented 9 months ago

Hi @danishbansal808 ,

thank you for the detailed request! At this moment we don't have Modin in our roadmap. There is not yet a lot of request from the community for the support of the framework.

If more users are interested on this and the feature is up voted we will consider it for the roadmap.

Hey @fabclmnt Do we have any plans in the future roadmap to make ydata-profiling library compatible with modin framework so we can leverage the full power of distributed computing for profiling the huge datasets?