Closed bmreiniger closed 3 years ago
This makes sense to me but will let @PaulWestenthanner take a look as well.
Thanks @wdm0006; for Hacktoberfest, I need (eventually) the label "hacktoberfest-accepted" for credit (also in #320).
As far as I can tell, any merged pull request should do the same automatically, but if not we can add the label.
I had some difficulties with the other handle_missing options, largely around the underlying OrdinalEncoder
. I think I have everything working, but the code isn't particularly elegant.
Aside from my suggestion above to let indicator
continue to always produce the column, I think the actual best behavior would be to
fit
data has any missings,fit
data has no missings; if the transform
data has missings, treat them as any other unknown category (using whatever handle_unknown
option is provided).Another question: when use_cat_names=True
, should the unknown column have a more descriptive name than -1
? I thought about using "unknown", but that's probably too likely to be an actual category value?
For Hacktoberfest, if I'm reading the FAQ and my progress page correctly, I need the accepted tag by tomorrow, but then we'll have 14 days to complete the review, and if it's decided that this PR doesn't meet the requirements you can invalidate it during that time.
LGTM thanks! Merging
Fixes #295
Proposed Changes
Prevents a column for missing values from being added in OneHotEncoder when handle_missing="error". Does this by preventing the underlying OrdinalEncoder from producing the mapping NaN->-2, by setting its handle_missing to "error" as well.
Also patches an incidental bug in OneHotEncoder.transform when handle_missing="error", when the input is not a frame.
Hacktoberfest?