unionai-oss / pandera

A light-weight, flexible, and expressive statistical data testing library
https://www.union.ai/pandera
MIT License
3.27k stars 305 forks source link

Multi-Level Column Headers #1025

Closed aboomer07 closed 1 year ago

aboomer07 commented 1 year ago

I'm not sure if this is a feature request or documentation improvement, haven't been able to find it in the documentation yet if it exists.

It seems like you can only use the from_format for loading schemas from a file in the SchemaModel configuration, but you can only specify a multi-level column index in a DataFrameSchema, with no way to convert from DataFrameSchema to SchemaModel, or to load a yaml file into a SchemaModel.

Apologies if I missed anything, love this library!

cosmicBboy commented 1 year ago

love this library

❤️

You can use aliases in this case: https://pandera.readthedocs.io/en/stable/schema_models.html#aliases

import pandera as pa

class Schema(pa.SchemaModel):
    col1: pa.typing.Series[int] = pa.Field(alias=("level1", "col1"))
    col2: pa.typing.Series[int] = pa.Field(alias=("level1", "col2"), check_name=True)

print(Schema.to_schema())

output:

<Schema DataFrameSchema(
    columns={
        '('level1', 'col1')': <Schema Column(name=('level1', 'col1'), type=DataType(int64))>
        '('level1', 'col2')': <Schema Column(name=('level1', 'col2'), type=DataType(int64))>
    },
    checks=[],
    coerce=False,
    dtype=None,
    index=None,
    strict=False
    name=Schema,
    ordered=False,
    unique_column_names=False
)>
aboomer07 commented 1 year ago

Thanks!