unionai-oss / pandera

A light-weight, flexible, and expressive statistical data testing library
https://www.union.ai/pandera
MIT License
3.27k stars 305 forks source link

Add config option to disallow duplicate column names #707

Closed benlindsay closed 2 years ago

benlindsay commented 2 years ago

I'm in the camp of wishing pandas would disallow duplicate column names. A fallback wish would be pandera disallowing duplicate column names by default. Since that would be a breaking change that I'm guessing won't fly, my fallback fallback wish is a config option to do so. My current solution is to define a BaseSchema that all my schemas inherit from that looks like this:

import pandas as pd
import pandera as pa
from pandera.typing import Series

class BaseSchema(pa.SchemaModel):
    """Pandera schema that disallows duplicate columns"""

    @pa.dataframe_check(ignore_na=False)
    def no_duplicate_columns(cls, df: pd.DataFrame) -> bool:
        col_counts = df.columns.value_counts().to_dict()
        duplicated_cols = [col for col, count in col_counts.items() if count > 1]
        if len(duplicated_cols) > 0:
            print(f"Duplicated columns found: {duplicated_cols}")
            return False
        return True

class MainSchema(BaseSchema):
    column_1: Series[str]
    column_2: Series[str]

I'd propose something like this as syntax to accomplish this:

import pandera as pa
from pandera.typing import Series

class MainSchema(BaseSchema):
    column_1: Series[str]
    column_2: Series[str]

    class Config:
        allow_duplicate_column_names = False

Would something like this be possible?

cosmicBboy commented 2 years ago

I support this feature @benlindsay! As I mentioned in your other issue I'm not sure when I'll be get to implementing this, but contributions are welcome!

benlindsay commented 2 years ago

Thanks! This is another one I wish I had time to make a PR for but don't foresee having the time for

m-richards commented 2 years ago

Hey, can I have a go at implementing this?

benlindsay commented 2 years ago

I haven't touched this, not sure about @cosmicBboy, but it's a yes from me!

cosmicBboy commented 2 years ago

please go ahead @m-richards! if you haven't already please take a look at the contributing guide and let me know if you have any questions!

cosmicBboy commented 2 years ago

fixed by #758