manuel-calzolari / sklearn-genetic

Genetic feature selection module for scikit-learn
https://sklearn-genetic.readthedocs.io
GNU Lesser General Public License v3.0
323 stars 77 forks source link

Using sklearn-genetic with neural networks #22

Closed chemckenna closed 2 years ago

chemckenna commented 3 years ago

Hi @manuel-calzolari

I am looking to use sklearn-genetic with a neural network, currently attempting to use with Keras NNs, although I am not necessarily tied to Keras.

I get the following error:

ValueError: Input 0 of layer sequential_2086 is incompatible with the layer: expected axis -1 of input shape to have value 180 but received input with shape (None, 118)

I understand why this is occurring - my NN input layer is expecting 180 features. Is there some way I can provide the number of features that sklearn-genetic is attempting to train with?

My KerasClassifier is defined as: estimator = KerasClassifier(lambda: create_nn_model(features=num_features), epochs=100) so I can dynamically supply this.

Can you suggest how I might use sklearn-genetic to select features for use in a NN?

Thanks for any help you can give.

chemckenna commented 3 years ago

My pipeline would be as follows:

        fs_step = GeneticSelectionCV(estimator,cv=5,
                                  verbose=1,
                                  scoring="r2", 
                                  max_features=180,
                                  n_population=50,
                                  crossover_proba=0.5,
                                  mutation_proba=0.2,
                                  n_generations=40,
                                  crossover_independent_proba=0.5,
                                  mutation_independent_proba=0.05,
                                  tournament_size=3,
                                  n_gen_no_change=10,
                                  caching=True,
                                  n_jobs=1)

        model = KerasClassifier(build_fn=lambda: create_nn_model(features=num_features, classes = 4, problem_type = 'multi_class', hl_act = 'relu', optimizer = 'Adam'), epochs=epoch, verbose=0, batch_size = 225)
        scale = StandardScaler()
        clf = Pipeline([('scale', scale),
                        ('fs_step', fs_step),
                            ('model', model)])

I can get your code running and have also been able to count the number of selected features with:

from collections import Counter
Counter(list(selector.support_))[1]

135

but I don't know how to feed that number into my pipeline and model.

manuel-calzolari commented 3 years ago

Sorry for the late reply.

Did you try to use the delayed-build pattern (no input shape specified) with keras? See https://www.tensorflow.org/api_docs/python/tf/keras/Sequential#examples_3

chemckenna commented 3 years ago

Sorry for the late reply.

Did you try to use the delayed-build pattern (no input shape specified) with keras? See https://www.tensorflow.org/api_docs/python/tf/keras/Sequential#examples_3

Hi - I did not, I proceeded with scikit-learn's MLPClassifier and MLPRegressor. Lesson learned, thanks for pointing that out.