Closed virtualphoton closed 2 years ago
Hi @virtualphoton, thanks for you submission! Always appreciate when someone identifies a bug or improvement.
Would you be able branch this and submit a pull request? Then I can merge to master, which will run CI tests in github actions. Going to close this in anticipation of a new pull request. Let me know if you need any help on that or want to discuss further. Happy to open another.
If a feature is categorical, then during
transform
it's possible that selected rows ofX
may have only some of categories, for example:raises exception
Traceback
``` ValueError Traceback (most recent call last) Input In [9], in(originally problem emerged on test set from this competition)
I made a fix in this commit:
SingleImputer
, duringfit
, columns after onehotencoding are savedtransform
,_one_hot_encode
ensures that same columns are used (so much lines because the missing category may be the first)Also probably it would've been easier to use sklearn's one hot encoders