OneHotEncoder: handle_missing = 'ignore' would be very useful

woodly0 commented 1 year ago

Expected Behavior

It would be nice to be able to ignore missing values instead of creating new columns with an "_nan" suffix. Just like it is possible with pandas. What do you think?

Actual Behavior

Doesn't exist in the current latest version (accoring to my knowledge)

Steps to Reproduce

import pandas as pd
import numpy as np
from category_encoders import OneHotEncoder

encoder = OneHotEncoder(
    cols=None,  # all non-numeric
    return_df=True,
    handle_missing="value",  # would be nice to have the option 'ignore'
    use_cat_names=True,
)
df = pd.DataFrame(
    {"this": ["GREEN", "GREEN", "YELLOW", "YELLOW"], "that": ["A", "B", "A", np.nan]}
)

encoder.fit_transform(df) # unwanted result
pd.get_dummies(df, dummy_na=False) # wanted result

Specifications

Version: 2.5.1.post0

PaulWestenthanner commented 1 year ago

I agree this would be useful. Do you want to create a pull request for it?

woodly0 commented 1 year ago

Hey Paul. Thanks for your reply. I will try to implement it.

scikit-learn-contrib / category_encoders

OneHotEncoder: handle_missing = 'ignore' would be very useful #386

Expected Behavior

Actual Behavior

Steps to Reproduce

Specifications