[ ] I have a training pipeline that hyperparameter tunes the best imputation method
[ ] My pipeline fails when sklearn's train_test_split(stratify=stratify_data) is insufficient with cols containing Nan values
[ ] Curious if this seems like a scikit-lego feature people would want
Here's my attempt to stratify cols with some Nans for more context, I am a beginner so open to better ideas or comments if this feature request is out of scope. Thanks in advance!! Appreciate everyone's contributions to this package!
Strat attempt:
X = result_df[feature_cols]
y = result_df['strokes_to_hole_out']
#Extract the columns for stratification
stratify_cols = ['from_location_scorer','from_location_laser']
stratify_data = result_df[stratify_cols]
#Split the data, using 'stratify_data' for stratification
X_train, X_valid, y_train, y_valid = train_test_split(X, y, test_size=0.2, random_state=42, stratify=stratify_data)
error I receive come training: Trial failed with exception: Found unknown categories ['blue'] in column 9 during transform
Hello!
Here's my attempt to stratify cols with some Nans for more context, I am a beginner so open to better ideas or comments if this feature request is out of scope. Thanks in advance!! Appreciate everyone's contributions to this package!
Strat attempt:
error I receive come training: Trial failed with exception: Found unknown categories ['blue'] in column 9 during transform