Open mcrumiller opened 8 months ago
Categoricals are represented as an array in the background for performance reasons (to avoid a hashmap lookup everytime we go from physical to encoding). Removing categories is therefore an expensive operation, as it would require re-encoding the physicals to the new categorical array.
I'm hesitant to include features like these as they appear cheap on the outside, but are quite expensive to run.
Description
Often when one is picking out subsets of categorical variables, it is desirable to remove the unused categories:
Remove unused categories
Remove specified categories
Note that values whose category was removed are converted to null.