Open jorisvandenbossche opened 1 week ago
im not aware of a dedicated issue for this either. i think at one point I made a PR trying to make more of the EA subclasses use is_valid_na_for
but that got tabled pending the nan-vs-na topic.
For the datetimelike cases i think/hope that mismatched NaTs will return False (i.e. np.timedelta64("NaT") in my_datetimeindex
should always be False
). Also Decimal("NaN")
should be handled correctly.
For the datetimelike cases i think/hope that mismatched NaTs will return False (i.e.
np.timedelta64("NaT") in my_datetimeindex
should always beFalse
).
Indeed, the np.timedelta64("NaT")
and np.datetime64("NaT")
only give True for timedelta/datetime index, respectively, and all other index dtypes return False for those, with one exception: categorical.
Also
Decimal("NaN")
should be handled correctly.
In the sense that it is not matched in general (again, except for categorical ..). But it seems also not be matched for object dtype with such decimal: Decimal("NaN") in pd.Index([Decimal("2.0"), Decimal("NaN")], dtype=object)
gives False
.
Expanded table:
dtype | None | nan | \<NA> | NaT | np.datetime64('NaT') | np.timedelta64('NaT') | Decimal('NaN') |
---|---|---|---|---|---|---|---|
object-none | True | False | False | False | False | False | False |
object-nan | False | True | False | False | False | False | False |
object-NA | False | False | True | False | False | False | False |
object-decimal-NaN | False | False | False | False | False | False | False |
datetime | True | True | True | True | True | False | False |
period | True | True | True | True | False | False | False |
timedelta | True | True | True | True | False | True | False |
float64 | False | True | False | False | False | False | False |
categorical | True | True | True | True | True | True | True |
interval | True | True | True | False | False | False | False |
nullable_int | False | False | True | False | False | False | False |
nullable_float | False | False | True | False | False | False | False |
string-python | False | False | False | False | False | False | False |
string-pyarrow | False | False | False | False | False | False | False |
str-python | False | False | False | False | False | False | False |
The below table gives an overview of the result value for:
i.e. how
Index.__contains__
handles various missing value sentinels as input for the different data types.The last three rows with not a single True are specifically problematic, this seems a bug with the StringDtype
But more in general, this is quite inconsistent:
The code to generate the table above:
cc @jbrockmendel I would have expected we had issues about this, but didn't directly find anything