unionai-oss / pandera

A light-weight, flexible, and expressive statistical data testing library
https://www.union.ai/pandera
MIT License
3.27k stars 305 forks source link

Failure to validate empty list #922

Closed sstadick closed 2 years ago

sstadick commented 2 years ago

Describe the bug When validating data that which is empty, pandera assumes the inner type is a float64.

Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

import pandera as pa
from pandera.typing import DataFrame, Series

class Example(pa.SchemaModel):
    counts: Series[int]
    values: Series[int]

def main():
    ex = DataFrame[Example]({"counts": [], "values": []})
    print(ex)

if __name__ == '__main__':
    main()
> poetry run python main.py
...
pandera.errors.SchemaError: expected series 'counts' to have type int64, got float64

See minimal example (small poetry project) here.

Expected behavior

I expected pandera to validate an empty list and not try to check the inner type since there is no inner type.

Desktop (please complete the following information):

cosmicBboy commented 2 years ago

This is an unfortunately side-effect of pandas defaulting to a float dtype if you try to initialize a dataframe with empty lists.

In [2]: pd.DataFrame({"a": [], "b": []}).dtypes
Out[2]:
a    float64
b    float64
dtype: object

If you use the coerce=True config:

class Example(pa.SchemaModel):
    counts: Series[int]
    values: Series[int]

    class Config:
        coerce = True

Then pandera will do the type coercion for you, otherwise this behavior is expected.

DataFrame[Example]({"counts": [], "values": []})

It's unclear what exactly to do in this case, except to explicitly use an empty series with dtypes specified. But if you want to use empty lists coerce=True is the way to go.

sstadick commented 2 years ago

Got it, the coerce=True works for me, and solves my immediate problem of getting errors when I sometimes have no data.

Thank you for the speedy reply!