multimeric / PandasSchema

A validation library for Pandas data frames using user-friendly schemas
https://multimeric.github.io/PandasSchema/
GNU General Public License v3.0
189 stars 35 forks source link

get_errors allow_empty=True fails with datetime types #43

Open sleipodin opened 3 years ago

sleipodin commented 3 years ago

Similar to https://github.com/TMiguelT/PandasSchema/issues/22...

When attempting to call get_errors() with allow_empty=True on a datetime Series, the following is encountered:

Traceback (most recent call last):
  File "xxx", line 12, in <module>
    errors = validator.get_errors(series, Column('', allow_empty=True))
  File "D:\Git\PandasSchema\pandas_schema\validation.py", line 92, in get_errors
    validated = (series.str.len() > 0) & simple_validation
  File "xxx\pandas\core\generic.py", line 5135, in __getattr__
    return object.__getattribute__(self, name)
  File "xxx\pandas\core\accessor.py", line 187, in __get__
    accessor_obj = self._accessor(obj)
  File "xxx\pandas\core\strings.py", line 2100, in __init__
    self._inferred_dtype = self._validate(data)
  File "xxx\pandas\core\strings.py", line 2157, in _validate
    raise AttributeError("Can only use .str accessor with string values!")
AttributeError: Can only use .str accessor with string values!

To reproduce:

match_val = datetime.datetime(2020, 11, 1)
validator = CustomSeriesValidation(lambda s: s == match_val, 'did not match target date')
series = pd.Series(['2020-11-01'], dtype='datetime64[ns]')
errors = validator.get_errors(series, Column('', allow_empty=True))

It seems to be a quick fix at https://github.com/TMiguelT/PandasSchema/blob/master/pandas_schema/validation.py#L89. I have code and tests ready if interested.