unionai-oss / pandera

A light-weight, flexible, and expressive statistical data testing library
https://www.union.ai/pandera
MIT License
3.38k stars 310 forks source link

SchemaModel's index doesn't store the name #697

Closed roshcagra closed 2 years ago

roshcagra commented 2 years ago

Describe the bug When creating a SchemaModel, if you specify an Index and try to retrieve its name, the name is set to None instead of the field name.

Code Sample, a copy-pastable example

class MySchemaModel(SchemaModel):
    my_index: Index[pa.String] = pa.Field(coerce=True)

MySchemaModel.to_schema().index
#> <Schema Index(name=None, type=DataType(str))>

Expected behavior

I would expect the Index to set its name as the field name from the SchemaModel:

MySchemaModel.to_schema().index
#> <Schema Index(name='my_index', type=DataType(str))>

Desktop (please complete the following information):

Screenshots

If applicable, add screenshots to help explain your problem.

Additional context

Add any other context about the problem here.

jeffzi commented 2 years ago

You need to use check_name:

import pandera as pa
from pandera.typing import Index

class MySchemaModel(pa.SchemaModel):
    my_index: Index[pa.String] = pa.Field(coerce=True, check_name=True)

print(MySchemaModel.to_schema().index)
#> <Schema Index(name=my_index, type=DataType(str))>

The reason is that naming a pandera.Index will automatically activate name validation. In a SchemaModel, you have to name the class attribute to satisfy python's syntax. check_name is there to not force users to name the index of the DataFrames they want to validate.