Closed a-recknagel closed 1 year ago
hey @a-recknagel so unfortunately using SchemaModel
s in this way isn't possible because inheritence is only additive.
It's perhaps a little less intuitive, but if you want to rely on Python's inheritence semantics, I typically use this pattern:
import pandera as pa
from pandera.typing import DataFrame, Series
class BaseSchema(pa.SchemaModel):
# put all the common columns here
a: Series[int]
b: Series[int]
class SchemaA(BaseSchema):
c: Series[int]
d: Series[int]
class SchemaB(BaseSchema):
# suppose you drop but also add a bunch of columns. This is equivalent to
# dropping "a" and "b" and adding "e" and "f"
e: Series[int]
f: Series[int]
This can get overly verbose, so an alternative would be to patch the to_schema
method in SchemaB
by calling the parent class to_schema()
and then doing schema transformations on that.
import pandera as pa
from pandera.typing import DataFrame, Series
class SchemaA(pa.SchemaModel):
a: Series[int]
b: Series[int]
c: Series[int]
d: Series[int]
class SchemaB(SchemaA):
...
@classmethod
def to_schema(cls) -> pa.DataFrameSchema:
schema = super().to_schema()
return schema.remove_columns(["a", "b"])
print(SchemaB.to_schema())
# <Schema DataFrameSchema(
# columns={
# 'c': <Schema Column(name=c, type=DataType(int64))>
# 'd': <Schema Column(name=d, type=DataType(int64))>
# },
# checks=[],
# coerce=False,
# dtype=None,
# index=None,
# strict=False
# name=SchemaB,
# ordered=False,
# unique_column_names=False
# )>
Note, though, that with this method the SchemaB
class will still have a
and b
as class attributes, but pandera always uses the result of to_schema
to actually perform the validation.
Lemme know if this works for you!
Also, this question has come up before a few times, so probably worth adding a section in the docs about this... would you be interested in contributing a section on this page?
The second option hopefully works for me, thanks! I have some multiple inheritances, and always forget how to make overriding methods cooperative. I'll try to add the docs, too.
Cool, I'm gonna convert this issue into a discussion... would you mind marking my response as the answer?
Problem I use
SchemaModel
s extensively to type-hint my function calls and have written some code which uses the annotations at run-time. At one point, I wanted to define aSchemaB
which had dropped a bunch of columns from its parentSchemaA
:Unless I'm a bit blind, there is no way to define
SchemaB
in a straight-forward manner.Possible Solutions
SchemaModel
from an existing one with the option to remove columnsSchemaModel
from aDataFrameSchema
(see alternative)Alternatives I've been trying to use
DataFrameSchema
, which supports removing columns, forSchemaB
instead:But then typing doesn't work any more, because
SchemaB
isn't a type.Additional Context I can't invert the parent/child relationship of
SchemaA
andSchemaB
, because the actual parents ofSchemaA
are a bunch of otherSchemaModel
s, and the columns-to-keep are split among them. I'd like to avoid not havingSchemaB
inheriting fromSchemaA
because of code duplication -- many of the columns have rather complex definitions.