unionai-oss / pandera

A light-weight, flexible, and expressive statistical data testing library
https://www.union.ai/pandera
MIT License
3.27k stars 305 forks source link

SchemaModel inheritance field alias propagation #445

Closed m-richards closed 3 years ago

m-richards commented 3 years ago

Describe the bug A clear and concise description of what the bug is. In the particular case where a child SchemaModel defines no fields (e.g. the class body consists of pass or a config class only) , field alias names are not inherited from the parent.

Code Sample, a copy-pastable example

import pandas as pd
import pandera as pa
from pandera.typing import Series

class Schema(pa.SchemaModel):
    b: Series[str] = pa.Field(alias="test")

    class Config:
        name = "BaseSchema"
        strict = True

class Schema2(Schema):
    c: Series[str]

class Schema3(Schema):
    pass

df = pd.DataFrame({"test": ["2001"]})
df2 = pd.DataFrame({"test": ["2001"],
                    "c": ["2001"]})

Schema.validate(df)
Schema2.validate(df2)
Schema3.validate(df)

Expected behavior

I expect that Schema3.validate(df) should be legal - that is it should inherit the field alias "test" from Schema, in the same way as for Schema2. The actual behaviour is the following traceback:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\MyPrograms\Miniconda3\envs\panderaBug\lib\site-packages\pandera\model.py", line 193, in validate
    return cls.to_schema().validate(
  File "C:\MyPrograms\Miniconda3\envs\panderaBug\lib\site-packages\pandera\schemas.py", line 497, in validate
    error_handler.collect_error(
  File "C:\MyPrograms\Miniconda3\envs\panderaBug\lib\site-packages\pandera\error_handlers.py", line 32, in collect_error
    raise schema_error from original_exc
pandera.errors.SchemaError: column 'test' not in DataFrameSchema {'b': <Schema Column(name=b, type=<class 'str'>)>}

Desktop (please complete the following information):

Additional context

I originally stumbled across this where I was using my Schema3 equivalent as a temporary definition, before running code to work out what columns were missing when I noticed this behaviour. Perhaps this is obscure enough that this issue isn't really an issue, but the same thing also happens if one defines a Config subclass without adjusting fields, which is more likely to be a realistic issue.

I'd usually try and have a look to work out why a bug is happening, but I'm not familiar with how inheritance works for non-instantiated classes. [Edit: traceback for pandera 0.6.3]

cosmicBboy commented 3 years ago

thanks for submitting this bug @m-richards! #446 should do it, @jeffzi added you as reviewer, feel free to edit if there's a better solution, unit tests should catch this case now.

cosmicBboy commented 3 years ago

fixed by #446