vincentarelbundock / pymarginaleffects

GNU General Public License v3.0
47 stars 8 forks source link

`narwhals` to support pandas as input #108

Open s3alfisc opened 3 days ago

s3alfisc commented 3 days ago

Hi @vincentarelbundock - @juanitorduz pointed me towards @MarcoGorelli's narwhals project, which solves the "two DataFrame APIs" problem for developers by allowing to define APIs that are agnostic to the input data frame type.

I.e. one can do things as

import narwhals as nw
import pandas as pd 
import polars as pl

def func(df_any):
    df = nw.from_native(df_any)
    df = df.select(
        a_sum=nw.col('a').sum(),
        a_mean=nw.col('a').mean(),
        a_std=nw.col('a').std(),
    )
    return nw.to_native(df)

df = pd.DataFrame({'a': [1, 2, 3, 4, 5]})
func(df) # returns pandas DataFrame

df = pl.DataFrame({'a': [1, 2, 3, 4, 5]})
func(df) # returns polars DataFrame

In other words - your polars code will work on pandas inputs, without requiring pandas as a dependency! In fact, narwhals even allows you to drop the polars dependency if you wanted to.

Happy to try myself at a PR for this once I find the time =)

vincentarelbundock commented 3 days ago

Ooooh that sounds like exactly what we need.

I won't have much (if any) time to write code for this in the next couple months, but I'd be more than happy to review a PR, especially if the changes are pretty minimal.

Thanks for raising this issue!