pandas-dev / pandas-stubs

Public type stubs for pandas
BSD 3-Clause "New" or "Revised" License
234 stars 125 forks source link

Implement ExtensionArray _accumulate and _reduce #850

Closed MichaelTiemannOSC closed 6 months ago

MichaelTiemannOSC commented 10 months ago

Describe the bug The stubs for ExtensionArray (in pandas-stubs/core/arrays/base.pyi) does not provide type signatures for _accumulate and _reduce. To properly add typing information to the Pint-Pandas project, these need to be defined.

To Reproduce

  1. Minimal Runnable Example:
    
    import numpy as np
    import pandas as pd
    from typing import reveal_type
    from pandas.arrays import IntegerArray
    from pandas.api.extensions import ExtensionArray

_data: ExtensionArray = IntegerArray(values=np.array([1, 2, 3], dtype=int), mask=np.array([True, True, True], dtype=bool)) if isinstance(_data, ExtensionArray): reveal_type(_data) reveal_type(_data._accumulate) reveal_type(_data._reduce)

2.  Using `mypy`
3.  Show the error message received from that type checker while checking your example.

(pint-dev) % pre-commit run mypy --files foo.py mypy.....................................................................Failed

foo.py:9: note: Revealed type is "pandas.core.arrays.base.ExtensionArray" foo.py:10: error: "ExtensionArray" has no attribute "_accumulate" [attr-defined] foo.py:10: note: Revealed type is "Any" foo.py:11: error: "ExtensionArray" has no attribute "_reduce" [attr-defined] foo.py:11: note: Revealed type is "Any" Found 2 errors in 1 file (checked 1 source file)


Note that running the script in python works, because it uses actual Pandas code, not Pandas-Stubs:

(pint-dev) % python foo.py Runtime type is 'IntegerArray' Runtime type is 'method' Runtime type is 'method'



**Please complete the following information:**
 - OS: Mac OS
 - OS Version  14.1.2
 - python 3.11.4
 - mypy 1.8.0
 - version of installed `pandas-stubs`: 2.1.4.231227

**Additional context**
Add any other context about the problem here.
twoertwein commented 10 months ago

While they look very much private, they are documented: https://pandas.pydata.org/docs/reference/api/pandas.api.extensions.ExtensionArray._accumulate.html https://pandas.pydata.org/docs/reference/api/pandas.api.extensions.ExtensionArray._reduce.html and could therefore probably be added to pandas-stubs? @Dr-Irv

Dr-Irv commented 10 months ago

While they look very much private, they are documented: https://pandas.pydata.org/docs/reference/api/pandas.api.extensions.ExtensionArray._accumulate.html https://pandas.pydata.org/docs/reference/api/pandas.api.extensions.ExtensionArray._reduce.html and could therefore probably be added to pandas-stubs? @Dr-Irv

Agreed. PR with tests welcome

MichaelTiemannOSC commented 10 months ago

I'm glad to see its a simple case, but alas, it's just beyond my level of python and mypy type algebras.

mutricyl commented 6 months ago

I can not see any ExtensionArray specific test. @Dr-Irv can you advise on where they should be located ?

Dr-Irv commented 6 months ago

I can not see any ExtensionArray specific test. @Dr-Irv can you advise on where they should be located ?

I would add something to test_extension.py, but you can just add a test that asserts the types of _reduce() and _accumulate() to be Callable with appropriate arguments and return types.

mutricyl commented 6 months ago

I have added in core/arrays/base.pyi

    def _reduce(self, name: str, *, skipna: bool=..., keepdims: bool=... , **kwargs) -> Scalar: ...
    def _accumulate(self, name: str, *, skipna: bool=..., **kwargs) -> Self: ...

But now I am facing issues with tests:

not sure about the good first issue tag 😃

Dr-Irv commented 6 months ago

I had another recent case in dealing with Callable with odd arguments, and I think it will be hard to do the assert_type() based on what I've learned.

I'm fine if we don't include a test for this, and just add the declarations for the 2 functions.

As for the _reduce() issue with pyright, for extension arrays, the _reduce() operation could return an object of the dtype of the extension array, which could be anything, so use this instead:

    def _reduce(self, name: str, *, skipna: bool=..., keepdims: bool=... , **kwargs) -> object: ...

You may have to change tests/extension/decimal/array.py to return decimal.Decimal for _reduce() in there.

Agree this is not a good first issue any more, but I think you can do it!