pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.61k stars 17.9k forks source link

BUG: isna() does not catch np.NaN when datatype is Float64 #60106

Open mortnstak opened 2 hours ago

mortnstak commented 2 hours ago

Pandas version checks

Reproducible Example

import pandas as pd
import numpy as np

data={"x":[1,0],"y":[1,0]}
df=pd.DataFrame(data,dtype="Float64")
df['z']=df['y']/df['x']
df['z'].isna()

Issue Description

The pandas isna() function does not catch NaN values that are of type np.NaN when using the Float64 datatype. The call df['z'].isna() returns a series with following rows.

0    False
1    False
Name: z, dtype: bool

Using the code above, both rows return a false value. Using df['z'].apply(np.isnan) correctly returns false for the first row, and true for the second row.

0    False
1     True
Name: z, dtype: boolean

Expected Behavior

I would expect the pandas isna() function to also classify the np.NaN type as a null or nan value when using the Float64 datatype. The returned value of df['z'].isna() should be a series with following rows.

0    False
1     True
Name: z, dtype: bool

Installed Versions

INSTALLED VERSIONS ------------------ commit : 0691c5cf90477d3503834d983f69350f250a6ff7 python : 3.11.9 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.22631 machine : AMD64 processor : Intel64 Family 6 Model 141 Stepping 1, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : Norwegian Bokmål_Norway.1252 pandas : 2.2.3 numpy : 1.26.2 pytz : 2023.3.post1 dateutil : 2.8.2 pip : 24.2 Cython : None sphinx : None IPython : 8.18.1 adbc-driver-postgresql: None adbc-driver-sqlite : None bs4 : None blosc : None bottleneck : None dataframe-api-compat : None fastparquet : None fsspec : None html5lib : None hypothesis : None gcsfs : None jinja2 : None lxml.etree : None matplotlib : None numba : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None psycopg2 : None pymysql : None pyarrow : 14.0.1 pyreadstat : None pytest : None python-calamine : None pyxlsb : None s3fs : None scipy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlsxwriter : None zstandard : None tzdata : 2023.3 qtpy : None pyqt5 : None
mortnstak commented 2 hours ago

Also - this is correctly handled if datatype the non-nullable "float64"