unionai-oss / pandera

A light-weight, flexible, and expressive statistical data testing library
https://www.union.ai/pandera
MIT License
3.37k stars 310 forks source link

SchemaModel with nullable Config property fails to validate correctly #427

Closed jstammers closed 3 years ago

jstammers commented 3 years ago

Describe the bug I'm trying to validate a data frame that contains null values in an integer column. However, I'm encountering a SchemaError when validating the data, even after specifying nullable=True in the Config class

Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

import numpy as np
import pandas as pd
import pandera as pa

from pandera import Check, Column, DataFrameSchema

df = pd.DataFrame({"column1": [5, 1, np.nan]})

null_schema = DataFrameSchema({
    "column1": Column(pa.Int, Check(lambda x: x > 0), nullable=True)
    })

print(null_schema.validate(df)) #succeeds 

class InputDataModel(pa.SchemaModel):
    column1: pa.typing.Series[pa.typing.Int]

    class Config:
        nullable = True

InputDataModel.validate(df) #raises SchemaError

Expected behavior

InputDataModel should correctly validate the given data frame

cosmicBboy commented 3 years ago

hi @jstammers, currently there isn't a nullable option at the schema model Config (which maps onto the DataFrameSchema keyword options), but specifying additional constraints on Field should work:

class InputDataModel(pa.SchemaModel):
    column1: pa.typing.Series[pa.typing.Int] = pa.Field(nullable=True)
jstammers commented 3 years ago

My mistake! I was following the docs here and assumed that because coerce was specified, other properties would map onto keyword arguments for Field.

cosmicBboy commented 3 years ago

no probs! closing this issue