Closed adityaguru149 closed 2 years ago
hi @adityaguru149 good question!
How do I show the column_options (none of which is present) in Error Message?
You can set _column_options
as a private attribute, which SchemaModel ignores, so you can store arbitrary metadata there.
class DFSchema(pa.SchemaModel):
id1: pa.typing.Series[str]
id2: Optional[pa.typing.Series[str]]
id3: Optional[pa.typing.Series[str]]
data: pa.typing.Series[float]
# private attributes can contain arbitrary metadata
_column_options = {"id1", "id2"}
@pa.dataframe_check(
# error keyword arg gives you custom error messages
error=f"does not contain at least one of {_column_options}"
)
def atleast_one_from_column_options_present_check(cls, df: pd.DataFrame) -> bool:
columns_found = cls._column_options.intersection(df.columns)
return len(columns_found) > 0
The error summary looks like this:
Error Counts
------------
- column_not_in_dataframe: 1
- dataframe_check: 1
Schema Error Summary
--------------------
failure_cases n_failure_cases
schema_context column check
DataFrameSchema <NA> column_in_dataframe [id1] 1
does not contain at least one of {'id2', 'id1'} [False] 1
Question about pandera
Note: If you'd still like to submit a question, please read this guide detailing how to provide the necessary information for us to reproduce your question.
Use case - User provides a group_by column, code needs to groupby that column (at least one column is user supplied rest can be considered fixed) and then aggregate on another column
ex- groupby id1 and either of id2 or id3 and aggregate on data
Issue very similar to this issue in pydantic
At present, I have coded it as the following (def atleast_one_from_column_options_present_check)
Is there a better method? pandera checks? decorators?
How do I show the column_options (none of which is present) in Error Message?
Can this be taken up as a feature request to add it as a generic decorator function that can be used on schemas or schema models?