unionai-oss / pandera

A light-weight, flexible, and expressive statistical data testing library
https://www.union.ai/pandera
MIT License
3.27k stars 305 forks source link

Schema with MultiIndex's subindex dtype declared as something other than 'object' fails to validate an empty dataframe #937

Closed davidandreoletti closed 1 year ago

davidandreoletti commented 2 years ago

Describe the bug On an empty dataframe, with an empty multi index (ie empty sub indexes), validating the dataframe using the schema (declaring each subindex with non 'object' dtype) silently converts subindexes dtypes to 'object' types. Thereby failing the schema validation when the indexes in the schema declare for example 'Int64' dtype for each subindex.

Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

class TestSchema(pandera.SchemaModel)
       level0: pandera.typing.Index[pd.Int64Dtype] = pandera.Field(coerce=True)
       level1: pandera.typing.Index[pd.Int64Dtype] = pandera.Field(coerce=True)

data = pd.DataFrame(index=pd.MultiIndex.from_arrays([[]] * 2))
schema = TestSchema.to_schema()
schema.validate(data, {'lazy': false, 'inplace':True})
# Throws SchemaError: expected series '0' to have type Int64, got object

Expected behavior

An empty dataframe's multiindex whose schema indicate a specific dtype (eg: Int64) must be converted to said dtype and pass the schema validation.

Desktop (please complete the following information):

Additional context

None

davidandreoletti commented 2 years ago

@cosmicBboy PR provided. Let me know when you want to discuss this.