Closed the-matt-morris closed 1 year ago
go for it @the-matt-morris !
After trying this out locally, I'm thinking it's not going to be worth it:
pydantic.validate_arguments
actually needs access to Column
here, and it can't be imported in pandera.schemas
without circular import errors.strict
yields a less useful error message than the one provided already:import pandera as pa
class MySchema(pa.SchemaModel):
class Config:
strict = "yes"
str_col: pa.typing.Series[str]
int_col: pa.typing.Series[int]
dataframe_schema = MySchema.to_schema()
Traceback (most recent call last):
...
strict
value could not be parsed to a boolean (type=type_error.bool)
strict
unexpected value; permitted: 'filter' (type=value_error.const; given=yep; permitted=('filter',))
That message is more confusing than the existing SchemaInitError:
Traceback (most recent call last):
...
pandera.errors.SchemaInitError: strict parameter must equal either `True`, `False`, or `'filter'`.
okay, let's close this issue in that case. it was worth a try!
Describe the solution you'd like Use
pydantic.validate_arguments
forDataFrameSchema.__init__
function signature. This will validatestrict
, removing the necessity for this statement:https://github.com/unionai-oss/pandera/blob/6b6c9d5eea12b1b3640b4ba69178ae392132fcac/pandera/schemas.py#L190-L198
It would also add validation for
report_duplicates
to ensure value is one of["exclude_first", "exclude_last", "all"]
Would have to play around and make sure it doesn't do anything unintended to any of the other arguments.
Additional context This is not an earth-shattering proposal, but it does remove the need to manage the validation separately from the data type, which is mostly beneficial should the definition of the data type change in the future.