Hi, i am running some experiments using mlxtend with the SequentialFeatureSelector and i have a question. This is my current pipeline using SFS:
1) Split data into train/test
2) Run SequentialFeatureSelector over train set and get the best subset of features
3) Transform train & test sets with SequentialFeatureSelector.transform() to select the best subset of features
4) Fit a new model (same as used in SFS) with transformed train data
5) Evaluate predicting the transformed test data
This approach is based in the documentation example found here, however i see that SequentialFeatureSelector is able to perform cross-validation during the feature selection process (the cv parameter) and already gives the best k_score found during the process.
My question is: instead of applying the SFS over the train set only, would it be correct to pass the entire dataset and set a value for cv and then just assume the k_score_ as an evaluation metric for my experiment?
Hi, i am running some experiments using mlxtend with the
SequentialFeatureSelector
and i have a question. This is my current pipeline using SFS:1) Split data into train/test 2) Run
SequentialFeatureSelector
over train set and get the best subset of features 3) Transform train & test sets withSequentialFeatureSelector.transform()
to select the best subset of features 4) Fit a new model (same as used in SFS) with transformed train data 5) Evaluate predicting the transformed test dataThis approach is based in the documentation example found here, however i see that SequentialFeatureSelector is able to perform cross-validation during the feature selection process (the
cv
parameter) and already gives the best k_score found during the process.My question is: instead of applying the SFS over the train set only, would it be correct to pass the entire dataset and set a value for
cv
and then just assume thek_score_
as an evaluation metric for my experiment?Thanks in advance!