Closed dezoito closed 4 years ago
Yes, the problem is with an unseen category.
In the AutoML package that I'm working on I have a try ... except block for such situations. You can check details here: https://github.com/mljar/mljar-supervised/blob/master/supervised/preprocessing/label_encoder.py#L14
Will do. Thank you for the awesome work.
When first testing the
RandomForestClassifier
class I got an error:I believe that due to the 30% split in test/train data, there was no person with the workclass "Private", and thus that value was never encoded to a number in the training dataset artifact.
Rerunning the training and artifact generation in the jupyter notebook seemed to fix it for me.
(Posting this just in case someone gets stuck due to this error, as I have no suggestions on how to stop this from happening in the first place)