py-why / dowhy

DoWhy is a Python library for causal inference that supports explicit modeling and testing of causal assumptions. DoWhy is based on a unified language for causal inference, combining causal graphical models and potential outcomes frameworks.
https://www.pywhy.org/dowhy
MIT License
7.04k stars 924 forks source link

Support polars data frames #1151

Open krz opened 6 months ago

krz commented 6 months ago

Polars is a high-performance data frame library in python, renowned for its blazing-fast data processing capabilities and efficient, less cumbersome syntax. It stands out with its multi-threaded query engine and seamless integration with the Python ecosystem, making it an excellent choice for handling large datasets

While many popular libraries such as scikit-learn and seaborn support polars data frames, dowhy currently does not. The current way to use a polars data frames is to convert them to pandas before using them with dowhy (e.g. polars_df.to_pandas()

Please support polars natively, as its popularity is increasing.

amit-sharma commented 6 months ago

Thanks for raising this @krz. Can you give more details on how scikit-learn supports polars DFs? Do they have a common API that can support both pandas and polars (if installed)?

Also, we'd love to have contributions. Would you like to start a PR to support polars?

krz commented 6 months ago

Thanks for your reply. scikit-learn made sure that all their code supports the Python dataframe interchange protocol. See commits https://github.com/scikit-learn/scikit-learn/pull/26464 and https://github.com/scikit-learn/scikit-learn/pull/27315 and discussion https://github.com/scikit-learn/scikit-learn/issues/25896.

I think an important first step for dowhy would be to remove functionality that solely relies on pandas, such as https://github.com/py-why/dowhy/pull/1135