Open clo-vis opened 3 months ago
There are 2 different issues here:
cols1
, when you declare that cols1
is Sequence[str]
, that means that cols1
could be a list of strings OR a single string. We explicitly disallow a single string as the usecols
argument. So you're better off making cols1
a list[str]
.cols2
, we allow a Callable[[HashableT], bool]
and sets are hashable. So that is why your code is accepted.It may be that the solution is to change this line: https://github.com/pandas-dev/pandas-stubs/blob/48ca4b0d70f0695da6d452f7e203f05ad983913a/pandas-stubs/_typing.pyi#L529
to use SequenceNotStr[str | int]
and Callable[[str | int], bool]
, but I'd like to hear the opinion of @twoertwein on this.
>>> print(hash({1.0}))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'set'
sets are not hashable.
typing.py
class Hashable(Protocol, metaclass=ABCMeta):
# TODO: This is special, in that a subclass of a hashable class may not be hashable
# (for example, list vs. object). It's not obvious how to represent this. This class
# is currently mostly useless for static checking.
@abstractmethod
def __hash__(self) -> int: ...
from collections.abc import Sequence
from pandas import read_csv
cols1: Sequence[int] = [ 1 ]
read_csv("file_csv", usecols=cols1)
mypy: _error: No overload variant of "readcsv" matches argument types "str", "Sequence[int]" documentation: If list-like, all elements must either be positional (i.e. integer indices into the document columns) or ...
from collections.abc import Sequence from pandas import read_csv cols1: Sequence[int] = [ 1 ] read_csv("file_csv", usecols=cols1)
mypy: _error: No overload variant of "readcsv" matches argument types "str", "Sequence[int]" documentation: If list-like, all elements must either be positional (i.e. integer indices into the document columns) or ...
That seems to be a mypy
error. pyright
accepts the above code.
Having said that, pyright
accepts the example you had with a Callable
, so we'll have to see what (if anything) we can do about that.
In strict mode, pyright says _Type of "readcsv" is partially unknown. So if pyright doesn't know everything about read_csv, perhaps it cannot issue an error message either.
In strict mode, pyright says _Type of "readcsv" is partially unknown. So if pyright doesn't know everything about read_csv, perhaps it cannot issue an error message either.
We only test and support pyright
in basic mode. Strict is really hard to make work.
If I change the Callable[[HashableT], bool]
to Callable[[Hashable], bool]
, then pyright
in basic mode picks up the error. So a PR with that would solve that problem, at least for pyright
. But mypy
still doesn't flag the error.
Not sure what to do when mypy
has a bug.
If I change the Callable[[HashableT], bool] to Callable[[Hashable], bool], then pyright in basic mode picks up the error. So a PR with that would solve that problem,
It would probably create another problem, as
def cols(x: str) -> bool: return x in ["A", "B", "C"]
read_csv("file_csv", usecols=cols)
should be acceptable (and cols
is not a Callable[[Hashable], bool]
)
should be acceptable (and
cols
is not aCallable[[Hashable], bool]
)
Good point. Seems like we need to change it to: Callable[[int], bool] | Callable[[str], bool]
to use
SequenceNotStr[str | int]
andCallable[[str | int], bool]
, but I'd like to hear the opinion of @twoertwein on this.
I think that could be too strict (but might cover typical usage).
It seems to work as expected on pyright playground when inlining the TypeAlias (I think the issue is that we don't set the generic type of the TypeAlias when using it in the read_csv definition)
It seems to work as expected on pyright playground when inlining the TypeAlias (I think the issue is that we don't set the generic type of the TypeAlias when using it in the read_csv definition)
So something is going on that is different with what is in pandas-stubs
versus the simple test that you wrote.
I think it might be sufficient to replace with UsecolArgs
with UsecolArgs[HashableT]
(pyright in strict mode points this out and says that it therefore assumes "unknown" as the input argument for the Callable part, which explains no error for set
).
I think it might be sufficient to replace with
UsecolArgs
withUsecolArgs[HashableT]
(pyright in strict mode points this out and says that it therefore assumes "unknown" as the input argument for the Callable part, which explains no error forset
).
I tried that out locally and it works. So a PR that changes UseColArgsType
in pandas-stubs\io\parsers\readers.pyi
to UseColArgsType[HashableT]
would fix the second issue reported here.
I have the similar issue with usecols as the type of list of a string.
import pandas as pd
df = pd.read_csv("test.csv", usecols=["a", "b"]) # this line causes type conflict
Pyright gives the following diagnostic message, but according to the pandas docs usecols can be a list of strings. I guess the type annotation of usecols is not accurate.
Diagnostics:
1. No overloads for "read_csv" match the provided arguments [reportCallIssue]
2. Argument of type "list[str]" cannot be assigned to parameter "usecols" of type "UsecolsArgType[Unknown]" in function "read_csv"
Type "list[str]" is incompatible with type "UsecolsArgType[Unknown]"
"list[str]" is incompatible with protocol "SequenceNotStr[Hashable]"
"index" is an incompatible type
Type "(value: str, start: SupportsIndex = 0, stop: SupportsIndex = sys.maxsize, /) -> int" is incompatible with type "(value: Any, /, start: int = 0, stop: int = ...) -> int"
Missing keyword parameter "start"
Missing keyword parameter "stop"
"list[str]" is incompatible with "range"
"list[str]" is incompatible with "ExtensionArray"
... [reportArgumentType]
I have the similar issue with usecols as the type of list of a string.
import pandas as pd df = pd.read_csv("test.csv", usecols=["a", "b"]) # this line causes type conflict
Pyright gives the following diagnostic message, but according to the pandas docs usecols can be a list of strings. I guess the type annotation of usecols is not accurate.
I cannot reproduce this. We explicitly test for lists of strings with usecols
, so maybe you are using strict
pyright, which might be the cause of your issue.
@Dr-Irv Thank you for your reply.
When I looked at the Pyright configuration docs, it is stated that for reportArgumentType
, all three modes (basic, standard, and strict) have the same reporting severity, which is error
. Therefore, I don't think Pyright's type-checking mode is the problem.
On the other hand, when I checked the SequenceNotStr
definition, I noticed that the version I have is slightly different from the latest version. I also found this change done 8 months ago:
- def index(self, value: Any, /, start: int = 0, stop: int = ...) -> int: ...
+ def index(self, value: Any, start: int = ..., stop: int = ..., /) -> int: ...
So, the issue seems to be the version I have. I have still the old version. However, what I don't understand is that I am using the latest version (2.2.2) of Pandas, and I still don't have the change mentioned above. I downloaded the wheel file and inspected it, and it indeed contains the old version of the stub file. This is quite surprising.
I have probed some of the latest pandas packages from both pypi and conda and seen that all of them still contain old version of SequenceNotStr
which causes the problem with latest version of pyright.
I have probed some of the latest pandas packages from both pypi and conda and seen that all of them still contain old version of
SequenceNotStr
which causes the problem with latest version of pyright.
You need to install pandas-stubs
. That will properly type-check your code. It is a separate package.
To Reproduce
Expected behavior: No error for
usecols=cols1
, but an error forusecols=cols2
Actual behavior:
cols1
is not accepted, even thoughSequence[str]
is "list like" (at least I think so; the term is nowhere defined) and its elements are "strings that correspond to column names"cols2
is accepted, even though it is not a callable that can be "evaluated against the column names, returning names where the callable function evaluates to True"Please complete the following information:
pandas-stubs
: 2.2.2.240603