unionai-oss / pandera

A light-weight, flexible, and expressive statistical data testing library
https://www.union.ai/pandera
MIT License
3.22k stars 300 forks source link

Pyright error for custom data type: "Expected type expression but received "_DataTypeClass[Unknown]..." #1736

Open alex-wenzel opened 1 month ago

alex-wenzel commented 1 month ago

Describe the bug

I am trying to construct a custom Data Type along the lines of this Boolean example. My goal is to take a raw data column that contains strings in the format "HH:MM:SS" and represent them as integers instead using the coerce() function.

import pandera as pa
import pandas as pd
from pandera import dtypes
from pandera.engines import pandas_engine
from pandera.typing import DataFrame, Series

@pandas_engine.Engine.register_dtype(
    equivalents=["int", pd.Int64Dtype, pd.Int64Dtype()]
)
@dtypes.immutable
class Clocktime(pandas_engine.INT64):
    def coerce(
        self,
        series: pd.Series
    ) -> pd.Series:
        raise NotImplementedError

I would expect this code to pass all type checks, but I have the following from Pyright, which highlights the text pandas_engine.INT64 in the class definition:

Expected type expression but received "_DataTypeClass[Unknown] | ((_DataTypeClass[Unknown]) -> _DataTypeClass[Unknown])" "(_DataTypeClass[Unknown]) -> _DataTypeClass[Unknown]" is not a class

I'm not sure whether this is a Pyright or Pandera bug, I'm happy to submit it to Pyright instead if you think it belongs there.

Relevant versions

alex-wenzel commented 1 month ago

As an update, using pd.Int64Dtype rather than pandas_engine.INT64 (see below) passes the type checker, but I haven't run anything with it so I don't know if it's functional.

@pandas_engine.Engine.register_dtype(equivalents=["int", pd.Int64Dtype, pd.Int64Dtype()])
@dtypes.immutable
class Clocktime(pd.Int64Dtype):  ## No pyright error here
    def coerce(
        self,
        series: Series[str]
    ) -> Series[int]:
        return cast(Series[int], series.map(lambda x: clocktime_to_int(x)))