scikit-learn-contrib / category_encoders

A library of sklearn compatible categorical variable encoders
http://contrib.scikit-learn.org/category_encoders/
BSD 3-Clause "New" or "Revised" License
2.39k stars 393 forks source link

Added response coding #399

Closed Khaled-Issa closed 1 year ago

Khaled-Issa commented 1 year ago

Fixes #

Proposed Changes

I added a new encoding method to thet set of categorical encoders that are already there. The method is the response coding.

PaulWestenthanner commented 1 year ago

Hi @Khaled-Issa

thanks for your contribution. This encoder to me seems to do pretty much the same as the target encoder except that target encoders has regularization in order to avoid over-fitting in case categories are small. So I'm wondering if there is an added-value in the response coding? Do you have any academic reference or a good argument why this can be better (in some situations) than target encoding?

Khaled-Issa commented 1 year ago

Hi @PaulWestenthanner,

I got the idea of response coding from this medium article: https://medium.com/@thewingedwolf.winterfell/response-coding-for-categorical-data-7bb8916c6dc1

I agree with you that it acts like the target encoding, the only difference is it calculates the probabilities not just for the label =1 but for the label =0 too and adds two columns instead of one to the dataframe. one for the probs = 1, the other =0.

Do you think it'll be valuable to add a parameter in the target encoding that if set to true, adds the probs=0 too to the encoded dataframe or that will be redundant?