rsteca / sklearn-deap

Use evolutionary algorithms instead of gridsearch in scikit-learn
MIT License
770 stars 132 forks source link

Doesn't work with TimeSeriesSplit #14

Closed hyperh closed 7 years ago

hyperh commented 7 years ago

sklearn-deap doesn't seem to like it when I use TimeSeriesSplit even though TimeSeriesSplit should work like all other cross validation functions in sklearn. I've been using TimeSeriesSplit with Pipeline and GridSearchCV fine; I just replaced the GridSearchCV with EvolutionaryAlgorithmSearchCV.

    from sklearn.model_selection import TimeSeriesSplit
    grid = EvolutionaryAlgorithmSearchCV(
        estimator=pipe,
        params=param_grid,
        scoring=scoring,
        cv = TimeSeriesSplit(n_splits=10),
        verbose=1,
        population_size=50,
        gene_mutation_prob=0.10,
        gene_crossover_prob=0.5,
        tournament_size=3,
        generations_number=5,
        n_jobs=4
    )
    grid.fit(XTrain, yTrain)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/evolutionary_search/__init__.py", line 284, in fit
    self._fit(X, y, possible_params)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/evolutionary_search/__init__.py", line 344, in _fit
    halloffame=hof, verbose=self.verbose)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/deap/algorithms.py", line 147, in eaSimple
    fitnesses = toolbox.map(toolbox.evaluate, invalid_ind)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/multiprocessing/pool.py", line 260, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/multiprocessing/pool.py", line 608, in get
    raise self._value
TypeError: 'TimeSeriesSplit' object is not iterable
rsteca commented 7 years ago

I think you should use it like this:

from sklearn.model_selection import TimeSeriesSplit
grid = EvolutionaryAlgorithmSearchCV(
        estimator=pipe,
        params=param_grid,
        scoring=scoring,
        cv=list(TimeSeriesSplit(n_splits=10).split(XTrain)),
        verbose=1,
        population_size=50,
        gene_mutation_prob=0.10,
        gene_crossover_prob=0.5,
        tournament_size=3,
        generations_number=5,
        n_jobs=4
    )
grid.fit(XTrain, yTrain)

I'm not sure how GridSearchCV manages to use it some other way, but this way works for me.

hyperh commented 7 years ago

Thanks, that seems to have worked!