[FEATURE] Ability to stratify with cols that contain some Nans values, this way people can hyperparameter tune best imputation methods

Hello!

[ ] I have a training pipeline that hyperparameter tunes the best imputation method
[ ] My pipeline fails when sklearn's train_test_split(stratify=stratify_data) is insufficient with cols containing Nan values
[ ] Curious if this seems like a scikit-lego feature people would want

Here's my attempt to stratify cols with some Nans for more context, I am a beginner so open to better ideas or comments if this feature request is out of scope. Thanks in advance!! Appreciate everyone's contributions to this package!

Strat attempt:

X = result_df[feature_cols]
y = result_df['strokes_to_hole_out']

#Extract the columns for stratification
stratify_cols = ['from_location_scorer','from_location_laser']
stratify_data = result_df[stratify_cols]

#Split the data, using 'stratify_data' for stratification
X_train, X_valid, y_train, y_valid = train_test_split(X, y, test_size=0.2, random_state=42, stratify=stratify_data)

error I receive come training: Trial failed with exception: Found unknown categories ['blue'] in column 9 during transform

koaning / scikit-lego

[FEATURE] Ability to stratify with cols that contain some Nans values, this way people can hyperparameter tune best imputation methods #681