Open richierocks opened 2 years ago
The specialised ideas seem a better route to take
I am new to pandas, is there a huge performance overhead for apply on DataFrame? If not, then for the specialized version, we can stay consistent with the input of is_categorical_dtype(). A similar boolean array can be achieved with df.apply(pd.api.types.is_ordered_categorical_dtype)
, though it will be without the pd.NA
to signal non-categorical columns.
To retrieve the columns we can then simply do this:
people.loc[:, people.apply(pd.api.types.is_ordered_categorical_dtype)]
Is your feature request related to a problem?
I'd like to be able to easily select only ordered categorical columns, or only unordered categorical columns, from a dataframe.
Example
Here's an example dataset:
Here,
eye_color
is an unordered categorical column,age_group
is an ordered categorical column, andage
is numeric. I want just theage_group
column.My best attempt at selecting ordered categorical columns is
This solution feels overly complicated for such a simple task.
Describe the solution you'd like
There are a few options for what nicer code might look like.
If ordered and unordered categoricals had different dtypes (as in R with
factor
vs.ordered
), then I could just writepeople.select_dtypes("ordered")
. Unfortunately, this would have breaking changes for all other code that assumes the dtype of ordered categoricals.If dataframe-level
.cat.*
methods existed, I could write something likeA variation on this might be to have more specialized equivalents of
.api.types.is_categorical_dtype()
, perhaps.api.types.is_ordered_categorical_dtype()
and.api.types.is_unordered_categorical_dtype()
.API breaking implications
The first option mentioned above has API breaking changes; the other two options do not.
Additional context
I asked the internet for better solutions; no response so far.