Open vlulla opened 1 year ago
It looks like what you want here is perhaps more of an Enum, in order to constrain the allowed values? And unfortunately we don't currently support Enums. Pandas treats Enums/Categoricals as the same thing (which I don't disagree with, as the concept of "category" can certainly encompass Enum use), whereas we currently treat Categoricals as essentially a string optimisation.
I'll leave it to @ritchie46 to opine on the possibility of extending our current Categorical support further; I imagine it's not a small endeavour, though it's clearly useful ;)
I read! It is also worth considering that booleans (true/false values) and integers (1/0 for true/false as bit mask images; and classified images...land use land cover classification maps used in ecology) are quite commonly used as categoricals. So, even if string only categories are supported it might be worthwhile to have a few examples, probably in user guide or docstring, showing how to convert these non-string categoricals into categoricals. I anticipate that these examples will be useful for scientists who use R and pandas for modeling (ecology, earth science...my experience) and would like to incorporate polars into their workflow.
Anyways, polars is awesome and every time I use it i like it more! Thank you for your consideration.
Enum
support is on its way, I suppose we can then eventually support an Enum
with integer entries.
This is something we would like to support.
Problem description
Consider this brief code segment:
I don't understand this restriction.
pyarrow
allows having dictionaries with integer values! Both, R and pandas also allow creating categorical column with integer values.By the way, I am aware that I can work around this by using
rename_categories
... but i'm wondering what is the basis for this restriction!More important, is there a possibility that this restriction can be relaxed?
Thanks in advance!