Open butterlyn opened 4 months ago
If you just need this for unit tests, you can just write your own assertion util for this, e.g. cast your Enums to strings before checking equality, and then check Enum categories separately.
Still this is an important variant of categorical data that we should support.
Thanks @stinodego, yeah I mostly need this for unit testing
If you recommend casting Enums to strings when using polars.testing.assert_frame_equal
, perhaps you'd consider reopening this: https://github.com/pola-rs/polars/issues/16075? 😁 No fuss if not, happy to make do in the meantime since unordered enum can serve the same purpose
If you recommend casting Enums to strings when using polars.testing.assert_frame_equal, perhaps you'd consider reopening this: https://github.com/pola-rs/polars/issues/16075? 😁 No fuss if not, happy to make do in the meantime since unordered enum can serve the same purpose
No, because the point I made there still stands :)
@butterlyn for now you can just make sure to always sort your categories before creating your Enum dtype.
def sorted_enum(categories):
return pl.Enum(sorted(categories))
assert_series_equal(
pl.Series(["a"], dtype=sorted_enum(["a", "b", "c"])),
pl.Series(["a"], dtype=sorted_enum(["b", "c", "a"])), # different order
)
Description
Following on from suggestion in https://github.com/pola-rs/polars/issues/16689
Add a boolean parameter
ordered
topolars.Enum
to allow for evaluating Enums irrespective of their category order.The following should raise no errors:
Example use case - unit testing
The intended purpose is to allow for defining an unordered
pl.Enum
in unit tests which can be used in columns of a DataFrame/LazyFrame supplied topolars.testing.assert_frame_equal
. The idea is that the unit test should check that the correctpl.Enum
is cast to the correct columns without caring about the order of the enum categories defined in the source code.For example, for
my_module
:We could write a unit test: