Closed markkvdb closed 3 years ago
Hi @markkvdb,
You can use the regex
argument of Field. You'll also have to pass the regex in alias
if it is not a valid name for a class attribute (which is the case in your example):
import pandas as pd
import pandera as pa
from pandera.typing import Series
class Schema(pa.SchemaModel):
a: Series[int] = pa.Field(alias="^[0-9]+$", regex=True, ge=0)
df = pd.DataFrame({"1": [-1]})
Schema.validate(df)
#> Traceback (most recent call last):
#> /tmp/ipykernel_259488/1760133918.py in <module>
#> ----> 1 Schema.validate(df)
...
#> SchemaError: <Schema Column(name=1, type=DataType(int64))> failed element-wise validator 0:
#> <Check greater_than_or_equal_to: greater_than_or_equal_to(0)>
#> failure cases:
#> index failure_case
#> 0 0 -1
If you define a DataFrameSchema
instead, Column has a similar regex
argument.
Thanks for your clear answer. It seems to work now! I already had it working with DataFrameSchema
but all other schemas were defined using the SchemaModel
, so using a single approach is a bit nicer.
Using regex for column names in SchemaModel
Is it possible to write a
SchemaModel
class in which the column names follow a regex pattern, e.g.,^[0-9]+$
for 0, 1, 2, 3, 4, etc?If not, can I use the
DataFrameSchema
class in the same way as theSchemaModel
class?