pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.8k stars 17.98k forks source link

BUG: FutureWarning for Boolean sparse dtypes in pd.DataFrame.sparse.from_spmatrix() #59739

Open sammlapp opened 2 months ago

sammlapp commented 2 months ago

Pandas version checks

Reproducible Example

import scipy
import pandas as pd

coo = scipy.sparse.coo_matrix([[False,True],[True,False]])
pd.DataFrame.sparse.from_spmatrix(coo)  # results in FutureWarning

coo = scipy.sparse.coo_matrix([[0,1],[1,0]])
pd.DataFrame.sparse.from_spmatrix(coo) # no warnings

Issue Description

Attempting to use from_spmatrix() with a boolean-type scipy.sparse matrix raises a warning about arbitrary scalar fill_value:

FutureWarning: Allowing arbitrary scalar fill_value in SparseDtype is deprecated. In a future version, the fill_value must be a valid value for the SparseDtype.subtype. pd.DataFrame.sparse.from_spmatrix(coo)

but using sparse integer dtype for the scipy matrix does not. I don't understand why this occurs, but it seems like from_spmatrix should be able to handle both of these scenarios. Also, there is no argument to from_spmatrix to specify a type, so it is unclear what the user should do about this future warning if anything.

Expected Behavior

No warning, uses dtype matching input

Installed Versions

INSTALLED VERSIONS ------------------ commit : d9cdd2ee5a58015ef6f4d15c7226110c9aab8140 python : 3.9.13.final.0 python-bits : 64 OS : Darwin OS-release : 21.6.0 Version : Darwin Kernel Version 21.6.0: Mon Aug 22 20:19:52 PDT 2022; root:xnu-8020.140.49~2/RELEASE_ARM64_T6000 machine : arm64 processor : arm byteorder : little LC_ALL : None LANG : None LOCALE : None.UTF-8 pandas : 2.2.2 numpy : 1.26.4 pytz : 2024.1 dateutil : 2.9.0.post0 setuptools : 70.1.0 pip : 22.1.2 Cython : None pytest : 8.2.2 hypothesis : None sphinx : 7.3.7 blosc : None feather : None xlsxwriter : None lxml.etree : 4.9.3 html5lib : 1.1 pymysql : None psycopg2 : None jinja2 : 3.1.4 IPython : 8.18.1 pandas_datareader : None adbc-driver-postgresql: None adbc-driver-sqlite : None bs4 : 4.12.3 bottleneck : None dataframe-api-compat : None fastparquet : None fsspec : 2024.6.0 gcsfs : None matplotlib : 3.9.0 numba : 0.60.0 numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pyreadstat : None python-calamine : None pyxlsb : None s3fs : None scipy : 1.13.1 sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None zstandard : None tzdata : 2024.1 qtpy : None pyqt5 : None
rhshadrach commented 2 months ago

Thanks for the report, the result appears correct on main where the warning has been removed. This warning should not be surfaced to the user. Further investigations and PRs to fix are welcome!