Open stevenlis opened 2 years ago
why? what are you actually trying to do
the codes are an implementation detail
@jreback As I explain, to select rows above or below a certain code when you have a ordered categorical column.
these labels should already respond to the full suite of comparators eg
df[df.ordered_cat > 'value1'] should select values that are greater than in code space
Indeed, you could do a semantic selection with a categorical, but it might still be helpful, let's say 3 quarters after ...
You could simply add 3 to a code. Right now, as far as I know, if you have to do that, you have to index a list .dtype.categories..index(value1) + 3
and then find the value/item in that list.
again these are an implementation detail - you can use them but -1 on adding api beyond which already exists
the semantic selections are pretty useful here ; it's not clear why you cannot simply use these
It does not exist... the codes has more use cases than just an implementation detail. For example, if you need to run a regression mode, you can simply use cat.codes
to make your input numerical instead of string. It would be helpful to figure out what the code is for each value in that variable. Right now, there is no way to easily know how each of the values is coded other than cat.codes
, which is a series method, and you have to index your entire dataframe to use it.
I think this could also be useful when you want to maintain a CategoricalDtype for roundtripping some IO formats. With SQL as an example, the CategoricalDtype does the "right thing" when you just build a dataframe and write it, but if you want to issue a WHERE clause on return that only filtered to a subset of your Dtype it becomes difficult to get access to those codes.
I could see it being useful for CategoricalDtype to behave more like an Enum in this instance
@StevenLi-DS to clarify this is what I have in mind:
enum.Enum("AnEnum", cat)
Currently this yields TypeError: 'CategoricalDtype' object is not iterable
but might be a natural Pythonic way to get what you are ultimately after, without requiring pandas to generate a larger API footprint. Would you be interested in exploring that more?
Thank you @WillAyd. I'm not familiar with enum, but think return a dict would give us more usability and flexibility.
Hello, this issue is available to take?
@JgLemos sure still open
take
Feature Type
[X] Adding new functionality to pandas
[ ] Changing existing functionality in pandas
[ ] Removing existing functionality in pandas
Problem Description
I wish I could check the underlying code for each value against a categorical column directly without indexing and using
cat.codes
Assume I have the following dataframe
I need to select all the rows after
2020Q2
. I have to first find the underlying code of the value/label2020Q2
, but I can only do so by indexing the dataframe against it and then usecat.codes
, and then indexing the array return to get the first value. This is a little bit tedious.Feature Description
Right now if you use
df.quarter.dtype.categories
, it only returns the categories as a listIt would be great if there is a attribute to return a map of categories and codes together in a dictionary so that users could simply find the codes by using categories as dict keys For example
returns
Alternative Solutions
Maybe it could also be a
get_cat_code()
function in pandas api so that users could input a category to get the underlying code, such asget_cat_code(cat='2020Q2')
Additional Context
No response