multimeric / PandasSchema

A validation library for Pandas data frames using user-friendly schemas
https://multimeric.github.io/PandasSchema/
GNU General Public License v3.0
189 stars 35 forks source link

get_errors() fails when Series.dtype=category #22

Closed caddac closed 5 years ago

caddac commented 5 years ago

if np.issubdtype(series.dtype, np.number): from https://github.com/TMiguelT/PandasSchema/blob/c157e1423188d9bd82d7f268128eda67acd2d4f4/pandas_schema/validation.py#L87-L90 fails for pandas dtype of category with error TypeError: data type not understood. I'll open a PR shortly to better handle this dtype checking. Thinking something simple like

        try:
            if np.issubdtype(series.dtype, np.number):
                validated = ~series.isna() & simple_validation
            else:
                validated = (series.str.len() > 0) & simple_validation
        except TypeError as te:
            validated = (series.str.len() > 0) & simple_validation