When target encoder is trained on a binary target and transforms an example where only one instance of the particular category is seen in training, I would expect the output to differ from the cases where no training instances are given for the category.
Actual Behavior
When target encoder is trained on a binary target and transforms an example where only one instance of a particular category is seen in training, the output is as if there were no training instances for the category and does not seem to take into account the value of the target in the training instance.
Steps to Reproduce the Problem
Using the following file: data.csv
In the following example you can see that the encoded category is the same for e.g. 'Acokanthera' and 'Adenium' when the target value for the training instances differ. Similarly the encoded value is the same for cases with no training instances, e.g. 'Prismatomeris'
import pandas as pd
import category_encoders as ce
data = pd.read_csv('data.csv')
train = data[data['Name'] != 'Prismatomeris']
X_train = train['Name']
y_train = train['Target']
target_encoder = ce.TargetEncoder(cols=['Name'])
target_encoder.fit(X_train, y_train)
encoded_X_train = target_encoder.transform(X_train)
encoded_X_train.to_csv('train_encoded.csv')
test = data[data['Name'] == 'Prismatomeris']
X_test = test['Name']
encoded_X_test = target_encoder.transform(X_test)
encoded_X_test.to_csv('test_encoded.csv')
Expected Behavior
When target encoder is trained on a binary target and transforms an example where only one instance of the particular category is seen in training, I would expect the output to differ from the cases where no training instances are given for the category.
Actual Behavior
When target encoder is trained on a binary target and transforms an example where only one instance of a particular category is seen in training, the output is as if there were no training instances for the category and does not seem to take into account the value of the target in the training instance.
Steps to Reproduce the Problem
Using the following file: data.csv In the following example you can see that the encoded category is the same for e.g. 'Acokanthera' and 'Adenium' when the target value for the training instances differ. Similarly the encoded value is the same for cases with no training instances, e.g. 'Prismatomeris'
Specifications