pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
42.59k stars 17.57k forks source link

BUG: `DataFrame.sparse.from_spmatrix` hard codes an invalid ``fill_value`` for certain subtypes #59064

Closed christopher-titchen closed 5 days ago

christopher-titchen commented 1 week ago
mroeschke commented 5 days ago

Thanks @christopher-titchen

bmreiniger commented 8 hours ago

This seems to be responsible for a breaking change in a workflow of mine. We consume the output of a sklearn OneHotEncoder, which is sparse with float type, and instantiate a sparse pandas frame from it. That used to produce values of 1.0 and 0.0, and now produces instead 1.0 and np.nan.

It doesn't look like the sparse instantiation allows the fill_value; is there another easy way we can adjust to the new behavior? (Casting to integers would be fine for this particular case, although our code is more generic than just OneHotEncoder results, so I'm not positive that's generalizable.)