OneHotEncoder produces a field with nan as a suffix to its name when handle_missing is set to "error", even though there is no missing data. Leaving handle_missing argument set to the default value works fine.
Code to reproduce
Assume data is a dataframe with a categorical column named "main_field", and there is no missing data in this column.
enc_data will have a column named "main_field_nan" that's made of only zeros. The result is also the same if handle_unknown="error" is also passed as an argument along with handle_missing="error".
Description
OneHotEncoder
produces a field with nan as a suffix to its name whenhandle_missing
is set to"error"
, even though there is no missing data. Leavinghandle_missing
argument set to the default value works fine.Code to reproduce
Assume
data
is a dataframe with a categorical column named "main_field", and there is no missing data in this column.enc_data
will have a column named "main_field_nan" that's made of only zeros. The result is also the same ifhandle_unknown="error"
is also passed as an argument along withhandle_missing="error"
.Specifications