scikit-learn-contrib / category_encoders

A library of sklearn compatible categorical variable encoders
http://contrib.scikit-learn.org/category_encoders/
BSD 3-Clause "New" or "Revised" License
2.4k stars 393 forks source link

Fix issue #407 #409

Closed willsthompson closed 1 year ago

willsthompson commented 1 year ago

Fixes #407

Proposed Changes

PaulWestenthanner commented 1 year ago

unfortunately the tests are failing, I think this is because your approach might change the order of the categories. I've followed your idea and just implemented a very basic solution https://github.com/scikit-learn-contrib/category_encoders/blob/37fcf54613b0a23d52021862d8861600b51dc222/category_encoders/ordinal.py#L232-L233

This should keep the order and since sets are O(1) access time also solve the problem, although it might not be as elegant as yours. If you're happy with it, you can close the issue and PR. I hope I didn't make any mistake here Thanks for your effort!

willsthompson commented 1 year ago

@PaulWestenthanner I just tested on our sample data and this will work great, thanks for the quick response. Yours is actually slightly faster on the biggest intersections in our sample. Closing this PR and issue.