Closed dash-samuel closed 3 years ago
@dash-samuel Thanks for your feedback.
Actually, you need to use typing.Optional to express that a column is not required. The default is that all columns are required, similarly to the regular pandera api. See https://pandera.readthedocs.io/en/stable/dataframe_schemas.html#required
I realize that the documentation of SchemaModel does not mention how to make columns optional. @cosmicBboy I'll submit a fix for it.
from typing import Optional
import pandas as pd
import pandera as pa
from pandera.typing import Index, DataFrame, Series
class InputSchema(pa.SchemaModel):
a: Series[int]
b: Optional[Series[int]]
class Config:
name = "BaseSchema"
df = pd.DataFrame(
{
"a": ["2001", "2002", "2003"],
}
)
InputSchema.validate(df)
Notes:
a
contains strings, not integers.pa.Field()
if you don't need extra options or checks.strict
is False
by default.@jeffzi thank you very much for clarifying this, indeed this wasn't immediately clear to me when reading the documentation, adding that to it would definitely make it easier for users!
I am posting some example code again for the same use case with the applied fixes in the case where:
a
and b
are part of the schema.b
is optional.import pandas as pd
import pandera as pa
from typing import Optional
from pandera.typing import Index, DataFrame, Series, String
class InputSchema(pa.SchemaModel):
a: Series[String]
b: Optional[Series[int]]
class Config:
name = "BaseSchema"
df = pd.DataFrame({
"a": ["2001", "2002", "2003"],
})
InputSchema.validate(df)
Describe the bug Hi everyone, firstly thanks a lot for working on this library, it is indeed very useful!
I have discovered that validation of a data frame without a column in a schema specified using a SchemaModel, with
strict
set toFalse
in theConfig
still fails with the error:pandera.errors.SchemaError: column 'x' not in dataframe
.Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.
Code Sample, a copy-pastable example
Expected behavior
According to the documentation I would expect that setting
strict=False
within a Schema Model would mean that columns not specified in the schema are not checked, or is this something that is only made available in the object based API withrequired=False
? If so then apologies in advance.Desktop (please complete the following information):
Screenshots
If applicable, add screenshots to help explain your problem.
Additional context
Add any other context about the problem here.