Open tswast opened 1 year ago
There are a few things going on here, and I don't think we can do much about them.
First, from a static typing perspective, we can't track the type of DataFrame.index
. It could be a regular Index
, or a MultiIndex
. So if you know your DataFrame
is backed by a MultiIndex
, you'd have to cast df.index
to the MultiIndex
type.
Second, whether the underlying index is single-dimensional or a MultiIndex
, df.index.names
will return a list
of strings. It is also possible to return a list of None
if you clobber the names, but if we use static typing to declare that Index.names
returns a list[str | None]
, that will force more people to cast the result of Index.names
.
Finally, you reported a mypy error on your own code, but we'd prefer an example that is self-contained and can be run directly through the type checker. ipython
code can't be used that way.
I'm going to close this, but am willing to reopen it if you can convince me otherwise.
names
is available on both Index
and MultiIndex
. I think that's a moot point in regards to this issue.
Isn't list[str]
incorrect though? Unnamed indexes are incredibly common. Here's a standalone example:
(dev-3.10-pip) ➜ pandas-stubs-804 cat sample.py
import pandas as pd
df = pd.DataFrame({"a": ["a", "b", "c"], "i": [10, 11, 12]})
print(df.index.names)
# OK
df.index.names = ["idx"]
print(df.index.names)
# Not OK, but works
df.index.names = ("idx2",)
print(df.index.names)
df.index.names = [None]
print(df.index.names)
df.index.names = (None,)
print(df.index.names)
(dev-3.10-pip) ➜ pandas-stubs-804 mypy sample.py
sample.py:11: error: Incompatible types in assignment (expression has type "tuple[str]", variable has type "list[str]") [assignment]
sample.py:13: error: List item 0 has incompatible type "None"; expected "str" [list-item]
sample.py:15: error: Incompatible types in assignment (expression has type "tuple[None]", variable has type "list[str]") [assignment]
Found 3 errors in 1 file (checked 1 source file)
@Dr-Irv , I've added a standalone sample demonstrating the issue.
Output of sample.py
showing the default name is None:
(dev-3.10-pip) ➜ pandas-stubs-804 python sample.py
[None]
['idx']
['idx2']
[None]
[None]
Thanks @tswast for the example. I will reopen.
The property for names
should be updated here: https://github.com/pandas-dev/pandas-stubs/blob/9aac8e31ba69eb4c0583e55dd2198755fb031620/pandas-stubs/core/indexes/base.pyi#L291
in two ways:
list[str | None]
SequenceNotStr[str]
(but not Sequence[str]
)PR with tests welcome.
using str | None
sounds reasonable. Technically, any type seems to be accepted at runtime:
>>> df.index.names = [None]
>>> df.index.names
FrozenList([None])
>>> df.index.names = [1]
>>> df.index.names
FrozenList([1])
The pandas-internal annotations declare name
as any hashable object:
https://github.com/pandas-dev/pandas/blob/e86ed377639948c64c429059127bcf5b359ab6be/pandas/core/indexes/base.py#L1657C5-L1657C32
I couldn't find names
, but the doc-string here says that the elements have to be hashable https://github.com/pandas-dev/pandas/blob/e86ed377639948c64c429059127bcf5b359ab6be/pandas/core/indexes/base.py#L1753
Thanks @tswast for the example. I will reopen.
The property for
names
should be updated here:in two ways:
- The "getter" should return
list[str | None]
- The "setter" should allow any
SequenceNotStr[str]
(but notSequence[str]
)PR with tests welcome.
Actually I think the getter should also return SequenceNotStr
because if you set names
as a tuple, you will never be able to return a list[str | None]
(since there is no process when we set the names to convert them into anything else than their original name but please correct me if I am missing something.
Actually I think the getter should also return
SequenceNotStr
because if you setnames
as a tuple, you will never be able to return alist[str | None]
(since there is no process when we set the names to convert them into anything else than their original name but please correct me if I am missing something.
Can you provide an example?
This may be one of the cases where if we declare the getter
to return anything that could possibly be returned at runtime, then the most typical usage of returning a list of strings would force people to do a cast
on the result. So I lean towards supporting the typical usage that lets people avoid using cast
in their pandas code.
My bad I did not try it at runtime and thought the type would carry over in the getter but indeed it is a FrozenList that gets returned, let me write a quick test to make sure but I believe you are correct!
Added a PR to see it in action, there still seems to be an issue with the setter type not getting recognize (mypy
complains that it expects a list while we allow the setter to receive a SequenceNotStr
.
Open to ideas on this one.
Still waiting on mypy
with support for getters and setters support from @Dr-Irv's comment https://github.com/pandas-dev/pandas-stubs/pull/1031#issuecomment-2476627879
Describe the bug
type annotation for Index/MultiIndex.names is incorrect
To Reproduce
pandas
example that is not properly checked by the stubs.mypy
orpyright
).mypy
bigframes/core/blocks.py:436: error: Incompatible types in assignment (expression has type "tuple[None]", variable has type "list[str]") [assignment]
Please complete the following information:
Distributor ID: Debian Description: Debian GNU/Linux rodete Release: n/a Codename: rodete
pandas-stubs
pandas-stubs==2.0.3.230814Additional context Add any other context about the problem here.