Open alejandro-yousef opened 1 year ago
Hi everyone and thanks for this library!
I'm also interest by this question. Any answer?
hey @alejandro-yousef @tkaraouzene this is probably a common use case... here's the solution: gotta use __init_subclass__
instead of __new__
:
import pandas as pd
import pandera as pa
from pandera.typing import Series
from pydantic import BaseModel, PositiveFloat
class ConfigParams(BaseModel):
min_value: PositiveFloat
max_value: PositiveFloat
class MyBaseSchema(pa.SchemaModel):
_custom_config: ConfigParams
def __init_subclass__(cls, custom_config: ConfigParams, **kwargs):
super().__init_subclass__(**kwargs)
cls._custom_config = custom_config
class MySchema(MyBaseSchema, custom_config=ConfigParams(min_value=1, max_value=10)):
col1: Series[float]
@pa.check("col1")
def custom_check(cls, col1: Series[int]) -> Series[bool]:
return col1.between(cls._custom_config.min_value, cls._custom_config.max_value)
print(MySchema.to_schema())
output:
min_value=1.0 max_value=10.0
<Schema DataFrameSchema(
columns={
'col1': <Schema Column(name=col1, type=DataType(float64))>
},
checks=[],
coerce=False,
dtype=None,
index=None,
strict=False
name=MySchema,
ordered=False,
unique_column_names=False
)>
error:
pandera.errors.SchemaError: <Schema Column(name=col1, type=DataType(float64))> failed element-wise validator 0:
<Check custom_check>
failure cases:
index failure_case
0 0 -1.0
This would actually be a great addition to the tutorials: would one of you be able to add it somewhere on this page? https://pandera.readthedocs.io/en/stable/schema_models.html
Hi @cosmicBboy !
Thanks for this answer!
thanks @cosmicBboy
@cosmicBboy looking more closely, I realized that it will not be always possible to instantiate ConfigParams
when defining MySchema
because the value for the parameters min_value
and max_value
are only known at running time.
In other words, is there a way to avoid the ConfigParams
instantiation in the line below?
class MySchema(MyBaseSchema, custom_config=ConfigParams(min_value=1, max_value=10)):
Thanks you for your answers
hi @alejandro-yousef, seems like we've already discussed a solution for this here: https://github.com/unionai-oss/pandera/discussions/1067 :)
is there something missing in that solution?
Question about pandera
There are cases in which it is convenient to use a pydantic model to validate a
pd.DataFrame
. One way to do this is by creating a class inheriting frompa.SchemaModel
which includes an excluded attribute for the external object as it follows:I would like to know if there are better ways to achieve this? It seems to me a rather common use case...
A major disadvantage of this implementation is that calling
MySchema.to_schema()
results in the following error:This is important in order to leverage the DataFrameSchema Transformations.
Also all the classes inheriting from
MyBaseSchema
share the same attribute_config
which seems to be risky even if it is overridden for each instanceRelevant versions
python: 3.9 pandera: 0.13.4