rasbt / mlxtend

A library of extension and helper modules for Python's data analysis and machine learning libraries.
https://rasbt.github.io/mlxtend/
Other
4.85k stars 857 forks source link

Key Error when using bidirectional feature selector. #842

Closed joeanton719 closed 3 years ago

joeanton719 commented 3 years ago

Describe the bug

I am trying to select the right features for a regression problem using bi-directional feature selection. When I run the program I get the following error.

Steps/Code to Reproduce

Insert your example code here.

%%time
from mlxtend.feature_selection import SequentialFeatureSelector as SFS

cat_full = make_pipeline(
    (RareLabelEncoder(0.002, variables = ['brand', 'model'])),
    (MeanEncoder(variables = high_card_cols)),
    (OrdinalEncoder(variables = cat_cols)), 
    (CatBoostRegressor(learning_rate = 0.1, depth = 6, random_seed = seed, silent = True))
)

bi = SFS(cat_full, 
         k_features="best", 
         scoring = 'neg_root_mean_squared_error', 
         forward=True, 
         floating=True, 
         cv=10)

bi.fit(X_train, y_train)

print() 
print(f'Best RMSE: {bi.k_score_*-1:.3f}')

bi.k_feature_names_

Expected Results

Actual Results

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<timed exec> in <module>

~\anaconda3\lib\site-packages\mlxtend\feature_selection\sequential_feature_selector.py in fit(self, X, y, custom_feature_names, groups, **fit_params)
    566                     best_subset = k
    567             k_score = max_score
--> 568             k_idx = self.subsets_[best_subset]['feature_idx']
    569 
    570             if self.k_features == 'parsimonious':

KeyError: None

Versions

MLxtend 0.18.0 Windows-10-10.0.19041-SP0 Python 3.8.5 (default, Sep 3 2020, 21:29:08) [MSC v.1916 64 bit (AMD64)] Scikit-learn 0.24.2 NumPy 1.19.5 SciPy 1.5.2

joeanton719 commented 3 years ago

I found a workaround. I think using feature engineering transformers within the pipeline for the model was creating the problem. To solve this, I just created a separate dataset, applied the feature engineering transformations to the dataset, and used that for the sfs inputs.

rasbt commented 3 years ago

Glad you solved the issue/found out what the root cause was. If there's still an issue with the SFS, please don't hesitate to reopen