Open 0phoff opened 5 years ago
Not sure I see the issue here - from the code posted it looks like you are trying to mix tuples with categorical data which should be an object.
Do you mean to be using the add_categories
method:
http://pandas.pydata.org/pandas-docs/stable//user_guide/categorical.html#appending-new-categories
this seems likely the same issues as you mentioned above; append and concat are used in indexing expansion
the core issue should be addressed before this
note that indexing expansion is pretty inefficient and might be removed in the future ; better to explicitly append (which is also inefficient if doing it many times but it’s more obvious what is happening)
Not sure I see the issue here - from the code posted it looks like you are trying to mix tuples with categorical data which should be an object.
You can use .loc[non-existing index] = ('colval1', 'colval2', ...)
to set a new row, which is what I'm doing.
Not sure if you can wrap such a value in a categorical, but if that's the case, it still seems quite a burden to do.
add_categories
is not what I want. I do not want to add an extra possible category, I want to add an extra row of data in a dataframe that uses one or more categorical columns.
this seems likely the same issues as you mentioned above; append and concat are used in indexing expansion
the core issue should be addressed before this
I don't know enough of the pandas internals, but it seems kind of logical. I think overall support for these kinds of merging operations with categoricals is lacking in pandas.
note that indexing expansion is pretty inefficient and might be removed in the future ; better to explicitly append (which is also inefficient if doing it many times but it’s more obvious what is happening)
I thought it was just some sugar coating on top of append()
with a nicer syntax?
Is it that much more compute time, besides checking whether the index is already in the dataframe?
Code Sample, a copy-pastable example if possible
Problem description
There is no warning whatsoever, but still the dtype changes. In this dummy example this means we lose all information about the fact that
'd'
is also a possible value. (So simply doingastype('category')
wouldn't work here.)Note: We receive a lot of issues on our GitHub tracker, so it is very possible that your issue has been posted before. Please check first before submitting so that we do not have to handle and close duplicates!
Expected Output
Keep the categorical dtype if the added value is in the list of categories, throw an error/warning otherwise.
If people don't care about the categorical, they can always call
.astype('object')
before adding the row?I think this solution is also in the spirit of 'explicit is better than implicit`?
Output of
pd.show_versions()