rsteca / sklearn-deap

Use evolutionary algorithms instead of gridsearch in scikit-learn
MIT License
767 stars 132 forks source link

First generation is ignored #34

Open ClimbsRocks opened 6 years ago

ClimbsRocks commented 6 years ago

A duplicate of https://github.com/rsteca/sklearn-deap/issues/27 but hopefully easier to understand with a reproducible code block.

to reproduce, simply set generations_number=1 in the test.ipynb notebook.

when you do, you'll see the following: image

there are a number of things to note here (presumably all coming from the same root error of the first generation being skipped)

  1. despite generations_number=1, it's actually run for 2 generations
  2. the .cv_results_ table only has 22 rows- the number of evaluations that were in the second generation. it should have at least 50 rows (the number of evaluations in the first generation)
  3. there's an inconsistent error being thrown: AttributeError: 'EvolutionaryAlgorithmSearchCV' object has no attribute 'best_score_'. i haven't yet figured out when this error is thrown and when it is not, but given that nobody else has complained about it, i'm hoping that it's also related to this issue of just having 1 generation. it does work sometimes though, even with just 1 generation
  4. the information for the 0th generation does appear to be saved and available for the statement that's printing "best individual with fitness X", even when it's not available in .cv_results_. see the screenshot below image it's also worth noting on this same point that i have sometimes seen .best_score_ print out results that only report the best score from the second generation (which would be 0.923205 in this one-off example i created just for point number 4).

the full .cv_results_ table is below. interestingly, it seems to have some awareness that 50 other rows should be present (the index column starts at 51), but there are only 22 rows present: image

i'm hoping this is all an easily-fixed off-by-one error somewhere.

ClimbsRocks commented 6 years ago

@ryanpeach and @rsteca hopefully this is a clearer bug report than before! i'm back from vacation now, and should hopefully be a bit more responsive.

ryanpeach commented 6 years ago

Very interesting, thanks for the extra info. I bet you're onto something with those missing index numbers, it's completely possible we are not fully scraping the history file into the outputs. Unfortunately I won't have time to examine it for a while, just got a new job! But ill try whenever I get the chance. Feel free to dig into the source code, it's just one file so even if complicated it's not a lot to learn.

ClimbsRocks commented 6 years ago

@ryanpeach congrats on the new job! i've got a lead on the possible cause, if anyone's got a minute to check it out: https://github.com/rsteca/sklearn-deap/blob/master/evolutionary_search/cv.py#L416

it looks like we've registered only the mutation and mate stage, and not the initial population. it looks like we try to update the history with the population, but it doesn't appear to work.

hope this is an easy fix for someone! maybe @rsteca can figure this out.

rsteca commented 6 years ago

Sorry, I don't have time right now for looking into this. But if someone can figure this out and does a pull request I will gladly merge it.