manuel-calzolari / sklearn-genetic

Genetic feature selection module for scikit-learn
https://sklearn-genetic.readthedocs.io
GNU Lesser General Public License v3.0
323 stars 77 forks source link

long running time #5

Open quancore opened 5 years ago

quancore commented 5 years ago

Hi, I am using the library for feature elimination dataset: 400k * 50 all numeric columns meta model :

RandomForestRegressor(bootstrap=True, criterion='mse',
                                  n_estimators=25,
                                  n_jobs = 8,
                                  verbose=1)

algorithm setup:

GeneticSelectionCV(model,cv=3, verbose=1, n_population=30,
                                  scoring=scoring,
                                  max_features=40,
                                  caching=True,
                                  n_jobs=8)

I am getting a very long time of execution. It takes 1 or 2 hours of execution and still no result at all. Is it normal? How can I optimize?

manuel-calzolari commented 5 years ago

How many cores does your CPU have? If you have a single 8-core CPU, I'd try to set n_jobs=1 for GeneticSelectionCV, because you're already parallelizing at the model (Random Forest) level.

If this doesn't help, can you provide the scoring function?

quancore commented 5 years ago

Scoring is r2. I am using 8 core for random forest and I decreased num_estimator of the random forest to 10 as well as n_jobs=1. It took 5 hours to finish run. Is it something normal?

nightvision04 commented 4 years ago

If the runtime is really long (my job has been going for days), is it possible to stop a job early and check the best output?

manuel-calzolari commented 3 years ago

Sorry for the extremely late reply.

@quancore It seems a little too much. How long did it take for a single Random Forest to run?

@nightvision04 At the moment this is not possible.