Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
[X] I have checked that this issue has not already been reported.
[X] I have confirmed this bug exists on the latest version of pandas.
[X] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
from scipy.sparse import eye
pd.DataFrame.sparse.from_spmatrix(eye(2, dtype=bool))
Issue Description
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/pandas/pandas/core/arrays/sparse/accessor.py", line 316, in from_spmatrix
dtype = SparseDtype(array_data.dtype, 0)
File "/pandas/pandas/core/dtypes/dtypes.py", line 1751, in __init__
self._check_fill_value()
File "/pandas/pandas/core/dtypes/dtypes.py", line 1835, in _check_fill_value
raise ValueError(
ValueError: fill_value must be a valid value for the SparseDtype.subtype
Expected Behavior
The default argument for fill_value should be used instead of passing 0, which will fix the issue as the default missing value selected for bool is False. This bug also affects other dtypes like float and complex without raising a ValueError, as a fill_value of 0. or np.nan and 0. + 0.j, np.nan + 0.j, or np.nan respectively are more appropriate than 0.
We can also introduce a fill_value parameter to the DataFrame.sparse.from_spmatrix method, with a default argument of None, to fix the issue whilst giving the user flexibility to select a fill_value of choice.
Pandas version checks
[X] I have checked that this issue has not already been reported.
[X] I have confirmed this bug exists on the latest version of pandas.
[X] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
Expected Behavior
The default argument for
fill_value
should be used instead of passing0
, which will fix the issue as the default missing value selected forbool
isFalse
. This bug also affects other dtypes likefloat
andcomplex
without raising aValueError
, as afill_value
of0.
ornp.nan
and0. + 0.j
,np.nan + 0.j
, ornp.nan
respectively are more appropriate than0
.We can also introduce a
fill_value
parameter to theDataFrame.sparse.from_spmatrix
method, with a default argument ofNone
, to fix the issue whilst giving the user flexibility to select afill_value
of choice.https://github.com/pandas-dev/pandas/blob/c46fb76afaf98153b9eef97fc9bbe9077229e7cd/pandas/core/arrays/sparse/accessor.py#L316
https://github.com/pandas-dev/pandas/blob/c46fb76afaf98153b9eef97fc9bbe9077229e7cd/pandas/core/dtypes/missing.py#L638-L641
Installed Versions