unionai-oss / pandera

A light-weight, flexible, and expressive statistical data testing library
https://www.union.ai/pandera
MIT License
3.27k stars 305 forks source link

Index has no name in SchemaModel when using SchemaModel.strategy #1401

Open davidkleiven opened 10 months ago

davidkleiven commented 10 months ago

Describe the bug The index does not have a name when created via SchemaModel.strategy

Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

# Your code here
from pandera import SchemaModel, Field
from pandera.typing import Index, Series
from hypothesis import given

class MySchema(SchemaModel):
    unique_id: Index[str] = Field()
    value: Series[int] = Field()

@given(MySchema.strategy(size=5))
def test_name(df):
    assert df.index.name == "unique_id"

test_name()

Expected behavior

Expect that all dataframes that are produced contains an index with the name "unique_id"

rob-sil commented 9 months ago

To get a named index, you can use check_name=True (docs):

class MySchema(SchemaModel):
    unique_id: Index[int] = Field(check_name=True)
    value: Series[int] = Field()

The default, check_name=None, is interpreted as check_names=False for single-index data frames. With this default, the original MySchema won't look at the index name when validating. So even though the generated data frames have unnamed indices, they still pass MySchema.

See https://github.com/unionai-oss/pandera/issues/867 for why the default is to go with check_names=False.