unionai-oss / pandera

A light-weight, flexible, and expressive statistical data testing library
https://www.union.ai/pandera
MIT License
3.27k stars 305 forks source link

Set Name of Index #632

Closed markkvdb closed 2 years ago

markkvdb commented 2 years ago

Question about pandera

Defining a schema as:

import pandas as pd
import pandera
from pandera.typing import DataFrame, Index, Series

class ForecastSchema(pandera.SchemaModel):
    """Schema for a forecasting file."""

    time: Index[pd.DatetimeTZDtype] = pandera.Field(
        coerce=True, dtype_kwargs={"unit": "ns", "tz": "UTC"}
    )
    reference: Series[pd.StringDtype] = pandera.Field()
    value: Series[pandera.Float64] = pandera.Field(coerce=True)

does not set the name of the index field as can be seen from

print(ForecastSchema.to_schema())

which outputs

<Schema DataFrameSchema(
    columns={
        'reference': <Schema Column(name=reference, type=DataType(string[python]))>
        'value': <Schema Column(name=value, type=DataType(float64))>
    },
    checks=[],
    coerce=False,
    dtype=None,
    index=<Schema Index(name=None, type=DataType(datetime64[ns, UTC]))>,
    strict=False
    name=None,
    ordered=False
)>

This sets the name of the index to None. How can I change this so that name=time?

cosmicBboy commented 2 years ago

hi @markkvdb, you can try the Field(..., check_name = True) kwarg. For single index SchemaModels this is set to False (True for multi-index SchemaModels)

markkvdb commented 2 years ago

thanks @cosmicBboy. Works as expected.