When calling validate for a SchemaModel on a pandas's data frame with only 1 row, an AssertionError is raised. This appears to be triggered when coerce=True is includex in the field definition, but seems like a simple fix as detailed below
[x] I have checked that this issue has not already been reported.
[x] I have confirmed this bug exists on the latest version of pandera.
[x] (optional) I have confirmed this bug exists on the master branch of pandera.
Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.
Code Sample, a copy-pastable example
from pandera.typing import Series
import pandas as pd
import pandera as pa
class Model(pa.SchemaModel):
x: Series[int] = pa.Field(coerce=True)
Model.validate(pd.DataFrame({"x": [None]}))
# Results in the following error
# ...
# File "/cluster/home/willems/MethodsDev/vcgt/garbnewenv/pandera-fix/lib/python3.9/site-packages/pandera/engines/numpy_engine.py", line 57, in coerce
# failure_cases=utils.numpy_pandas_coerce_failure_cases(
# File "/cluster/home/willems/MethodsDev/vcgt/garbnewenv/pandera-fix/lib/python3.9/site-# packages/pandera/engines/utils.py", line 88, in numpy_pandas_coerce_failure_cases
# check_output = numpy_pandas_coercible(data_container, type_)
# File "/cluster/home/willems/MethodsDev/vcgt/garbnewenv/pandera-fix/lib/python3.9/site-packages/pandera/engines/utils.py", line 33, in numpy_pandas_coercible
# search_list = _bisect(series)
# File "/cluster/home/willems/MethodsDev/vcgt/garbnewenv/pandera-fix/lib/python3.9/site- packages/pandera/engines/utils.py", line 20, in _bisect
# assert (
# AssertionError: cannot bisect a pandas Series of length < 2
Expected behavior
The above validation should fail with a SchemaError instead of an AssertionError
Desktop (please complete the following information):
python 3.9.1
pandas 1.3.1
pandera 0.7.1
Additional context
Seems like the numpy_pandas_coercible() fn is missing a base case for a series that contains only 1 element on line 33:
# Current call which triggers the AssertionError as _bisect immediate checks that series.size >= 2
search_list = _bisect(series)
# Proposed fix
search_list =[[series.iloc[0]] if series.size == 1 else _bisect(series)
# I'm not sure if adding a check before line 33 for an empty series is also necessary in case this function is (mis)called on an empty series
if series.empty:
return pd.Series(dtype="bool")
Describe the bug
When calling validate for a SchemaModel on a pandas's data frame with only 1 row, an AssertionError is raised. This appears to be triggered when coerce=True is includex in the field definition, but seems like a simple fix as detailed below
Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.
Code Sample, a copy-pastable example
Expected behavior
The above validation should fail with a SchemaError instead of an AssertionError
Desktop (please complete the following information):
python 3.9.1 pandas 1.3.1 pandera 0.7.1
Additional context
Seems like the numpy_pandas_coercible() fn is missing a base case for a series that contains only 1 element on line 33: