scikit-learn-contrib / category_encoders

A library of sklearn compatible categorical variable encoders
http://contrib.scikit-learn.org/category_encoders/
BSD 3-Clause "New" or "Revised" License
2.41k stars 395 forks source link

Choosing the most appropriate encoder for dataset which has both ordinal and nominal categories #385

Closed itsaugat closed 1 year ago

itsaugat commented 1 year ago

Any recommendations on which encoder to use if a dataset has both ordinal and nominal categories ? I do not want to specify explicitly the columns that are ordinal and nominal but rather the encoder should itself find the embeddings.

Thank you.

PaulWestenthanner commented 1 year ago

Hi @itsaugat

there probably is not a one-size-fits-all encoder.
However every encoder that encodes nominal data of course can also encode ordinal data. In fact most (if not all of our encoders) do not even treat ordinal data different to nominal data. So I'd recommend you to just try a few different strategies like OneHotEncoding and TargetEncoding and see if it fits your needs