manuel-calzolari / sklearn-genetic

Genetic feature selection module for scikit-learn
https://sklearn-genetic.readthedocs.io
GNU Lesser General Public License v3.0
323 stars 77 forks source link

Explain output when verbose=1 #9

Closed quique0194 closed 3 years ago

quique0194 commented 4 years ago

I have the following output, Could you explain in the documentation what is the meaning of each column? Thanks!

image

quique0194 commented 4 years ago

It seems like the format of the output is: [avg(scoring) avg(number_of_features)] [std(scoring) std(number_of_features)] [min(scoring) min(number_of_features)] [max(scoring) max(number_of_features)]

And -10000 is the score you get when the number of selected features is greater than the number of max_features you set.

So what is the purpose of setting max_features?

manuel-calzolari commented 3 years ago

Sorry for the extremely late reply.

-10000 is a "bad score" that get assigned to invalid feature combinations to discourage their selection (combinations with a number of features > max_features are invalid).

It looks like, in your case, all the generated combinations are invalid (number of features > max_features). Commit 3d62422 should improve the situation by generating only valid combinations for the initial population.

Please feel free to reopen the issue if necessary.