rasbt / mlxtend

A library of extension and helper modules for Python's data analysis and machine learning libraries.
https://rasbt.github.io/mlxtend/
Other
4.86k stars 857 forks source link

Is it correct to assume k_score_ as a result? #811

Closed henrique-voni closed 3 years ago

henrique-voni commented 3 years ago

Hi, i am running some experiments using mlxtend with the SequentialFeatureSelector and i have a question. This is my current pipeline using SFS:

1) Split data into train/test 2) Run SequentialFeatureSelector over train set and get the best subset of features 3) Transform train & test sets with SequentialFeatureSelector.transform() to select the best subset of features 4) Fit a new model (same as used in SFS) with transformed train data 5) Evaluate predicting the transformed test data

This approach is based in the documentation example found here, however i see that SequentialFeatureSelector is able to perform cross-validation during the feature selection process (the cv parameter) and already gives the best k_score found during the process.

My question is: instead of applying the SFS over the train set only, would it be correct to pass the entire dataset and set a value for cv and then just assume the k_score_ as an evaluation metric for my experiment?

Thanks in advance!