unionai-oss / pandera

A light-weight, flexible, and expressive statistical data testing library
https://www.union.ai/pandera
MIT License
3.3k stars 308 forks source link

Allow use of generic pa.DataFrameSchema/Model for different supported libraries #1632

Open DavidSlayback opened 4 months ago

DavidSlayback commented 4 months ago

Is your feature request related to a problem? Please describe. It's a small issue, but in a repo that is attempting to transition from Pandas to Polars over time, there is a mix of possible Pandas and Polars dataframes of the same basic schema. Currently, it seems like I need to define two schemas for each: one for Pandas using pa.DataFrameModel, one for polars using pa.polars.DataFrameModel.

Describe the solution you'd like Ideally, the top-level pa.DataFrameModel and pa.DataFrameSchema functions would use something like @singledispatch to delegate to the appropriate backend version based on the input dataframe. This is similar to an Ibis Table where it's rare that you actually need to go into the specific backend to request a specific function.

Describe alternatives you've considered What I'm currently doing is just being more verbose and defining multiple schemas. It works fine! It just seems a bit strange as a workflow. Obviously if we were always in Polars it wouldn't be an issue, but that'll take a while.

cosmicBboy commented 4 months ago

I've thought about this a lot, and I think we're getting closer to this world. However my main concern is that this generic dataframe schema will have to include a superset of all the options for all of the dataframes. I think eventually we'll nail down a "common dataframe schema api to rule them all", in which case this concern is less of an issue.

We recently introduced a generic dataframe api: https://github.com/unionai-oss/pandera/tree/main/pandera/api/dataframe which is where this dispatching might happen. Currently pandas and polars schemas inherit from these classes (pyspark still needs to be done).

If folks engage with this issue (👍 or comment/discuss) we can prioritize this effort, but in the mean time @DavidSlayback if you can write down a spec for how this would all work with perhaps a code snippet sketch of how dispatching would work that would get the ball rolling.

DavidSlayback commented 4 months ago

Sure, I'll try to sketch something up later this week when I'm free!