Open jnothman opened 6 years ago
why would you actually want to do this? using object
dtypes in sparse has very little utility and is barely supported.
okay. so make it invalid. I'll admit i don't have a use case for it. it just came up when looking into the implementation of unstack and how to make that sparse. I still think the SparseSeries error is inappropriate.
On 17 Jan 2018 11:40 pm, "Jeff Reback" notifications@github.com wrote:
why would you actually want to do this? using object dtypes in sparse has very little utility and is barely supported.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/pandas-dev/pandas/issues/19278#issuecomment-358292162, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEz6y6ZQqux6x9BvzU5oaVt1Vkiijd0ks5tLeo4gaJpZM4Rg6sJ .
would appreciate a PR
@jnothman am happy to have a look at this if it is needed. Am having a little trouble understanding what you mean 'make it invalid' though?
I am currently working on this issue and am thinking that the solution is to add a check to the SparseSeries
which checks if the data is a CategoricalDtype
:
elif isinstance(data, CategoricalDtype):
if dtype is not None:
data = data.astype(dtype)
if index is None:
index = data.index.view()
else:
data=data.reindex(index, copy=False)
However the boolean isinstance(c, CategoricalDtype)
returns false even if c.dtype
returns CategoricalDtype
. I suspect I am missing something important here but I cannot find how to make this boolean true on a Categorical datatype.
For reference this elif block would be added at ~ line 174 of pandas.core.sparse.series.py
.
However the boolean isinstance(c, CategoricalDtype) returns false even if c.dtype returns CategoricalDtype.
CategoricalDtype
is the class of array.dtype
for a categorical array. You could use
if is_categorical_dtype(data):
...
@LEO-E-100 are you still working on this issue?
We can re-purpose this issue to be for allowing SparsArray[ExtensionDtype, fill_value]
. It's not exactly straightforward though.
Code Sample, a copy-pastable example if possible
Problem description
SparseArray
andSparseDataFrame
(or when callingSeries.to_sparse()
). This is inconsistent with the categorical dtype retained by dense Series and DataFrame.Expected Output
SparseDataFrame({'a': c})['a'].dtype == SparseSeries(c).dtype == SparseArray(c).dtype == Series(c).dtype
or at a minimum:
SparseSeries(c)
raises no error, and produces object dtype.Output of
pd.show_versions()