Open steve-mavens opened 1 month ago
Python's type system does not provide a way to discriminate between a regular float and a NaN float. I would argue that the current implementation does the most intuitive thing in most cases at the expense of a relatively rare use case. Anecdotally, I have found None
, np.nan
, and pd.NA
to be more common.
We could update to include the Literal[float('NaN')]
in the overloads, but (1) I'm not sure an instantiation like that is actually supported and (2) it may not even solve the issue, if type checkers infer it as a regular float
.
Yes, I think the basic problem here is that TypeIs
requires a clean division of types. It doesn't allow for a function that takes input type A | B | C
and returns True
for all A
and some values of C
, and False
for all B
and the remaining values of C
.
Thanks for the report. I'm not sure what we can do here, because we can't specify float("nan")
or np.nan
as a Literal
.
One workaround for users would be math.isnan(x) if isinstance(x, float) else pd.isna(x)
, but obviously there's some performance implications there. It's probably better to suppress the unreachable warning than take the hit.
We could update to include the
Literal[float('NaN')]
in the overloads, but (1) I'm not sure an instantiation like that is actually supported and (2) it may not even solve the issue, if type checkers infer it as a regularfloat
.
I thought that might work, but the typing system doesn't support it. Found this discussion: https://github.com/python/typing/issues/1160
Yes, I think the basic problem here is that TypeIs requires a clean division of types. It doesn't allow for a function that takes input type A | B | C and returns True for all A and some values of C, and False for all B and the remaining values of C.
Well, it might work if we could do Literal[float("nan")]
, since that would "divide" the types, although I'm not sure if the type checkers are handling that.
Also, there are allowed to be float NaN values distinct from float('nan')
, so it could still go wrong in even rarer cases.
Is there a way to import pandas.api.typing.NAType
(or pandas.libs._missing.NAType
) that the stubs approve of? The reason I found this quirk of the type hint is actually because of a bug in my real code. I have a variable taken from an unpacked row of a DataFrame, and I typed it as float
, when it could also be pd.NA
.
So, another workaround is to make sure variables/expressions that might contain a float NaN are typed as float | NAType
, or float | None
, so that mypy doesn't think the return value of pd.isna
is guaranteed False by the input type.
Is there a way to import
pandas.api.typing.NAType
(orpandas.libs._missing.NAType
) that the stubs approve of?
Right now, if you do from pandas._libs.missing import NAType
, that should work.
We're behind on updating the stubs for all 2.x support, which means that pandas.api.typing
doesn't exist yet in the stubs.
Great, thanks for your help. I'll probably use None
until NAType
is public in the stubs, since either way I just need to tell mypy that pd.isna
might return True.
So, another workaround is to make sure variables/expressions that might contain a float NaN are typed as float | NAType, or float | None, so that mypy doesn't think the return value of pd.isna is guaranteed False by the input type.
I ran into the same issue today, and resolved it by typing my variable like value: float | None
Describe the bug
Recently https://github.com/pandas-dev/pandas-stubs/pull/972 introduced a
TypeIs
annotation forpd.isna
. The annotation states that the function returns True if and only if the input value is one of the typesNaTType | NAType | None
. However, this is not correct, since the function also returns True for the valuefloat('nan')
, which is of typefloat
.To Reproduce
pandas
example that is not properly checked by the stubs.The output is
not a number
, but because of theTypeIs
annotation, mypy thinks that line is unreachablemypy
orpyright
).mypy 1.10.1
type_test.py:6: error: Right operand of "and" is never evaluated [unreachable] if isna(foo) and 1 == 1: ^
~type_test.py:7: error: Statement is unreachable [unreachable] print('not a number') ^~~~~Found 2 errors in 1 file (checked 1 source file)Please complete the following information:
pandas-stubs
: 2.2.2.240909Additional context Add any other context about the problem here.