openml-labs / gama

An automated machine learning tool aimed to facilitate AutoML research.
https://openml-labs.github.io/gama/master/
Apache License 2.0
92 stars 29 forks source link

ValueError while warm-starting with higher number of cores #207

Closed WmWessels closed 11 months ago

WmWessels commented 11 months ago

I get an error when warm-starting GAMA with a set of pipelines (using AsynchEA). When i set n_jobs = 4, I have no issues and the warm-starting executes correctly, but when I set this parameter to a higher value (for example 8 on my laptop, or up to 32 on a HPC), i get an error with the following traceback:

Traceback (most recent call last):
  File "/home/TUE/20174868/hpc_files/SBPort/experiment_runner.py", line 506, in <module>
    main()
  File "/home/TUE/20174868/hpc_files/SBPort/experiment_runner.py", line 497, in main
    experiment_runner.run_gama(max_total_time = 3600, max_eval_time = 360, warm_start = True, warm_start_path = path)
  File "/home/TUE/20174868/hpc_files/SBPort/experiment_runner.py", line 305, in run_gama
    clf.fit(X, y, warm_start = warm_start)
  File "/home/TUE/20174868/miniconda3/envs/thesis_env/lib/python3.11/site-packages/gama/GamaClassifier.py", line 142, in fit
    super().fit(x, y, *args, **kwargs)
  File "/home/TUE/20174868/miniconda3/envs/thesis_env/lib/python3.11/site-packages/gama/gama.py", line 583, in fit
    self._search_phase(
  File "/home/TUE/20174868/miniconda3/envs/thesis_env/lib/python3.11/site-packages/gama/gama.py", line 645, in _search_phase
    self._search_method.search(self._operator_set, start_candidates=pop)
  File "/home/TUE/20174868/miniconda3/envs/thesis_env/lib/python3.11/site-packages/gama/search_methods/async_ea.py", line 70, in search
    self.output = async_ea(
                  ^^^^^^^^^
  File "/home/TUE/20174868/miniconda3/envs/thesis_env/lib/python3.11/site-packages/gama/search_methods/async_ea.py", line 142, in async_ea
    new_individual = ops.create(current_population, 1)[0]
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/TUE/20174868/miniconda3/envs/thesis_env/lib/python3.11/site-packages/gama/genetic_programming/operator_set.py", line 88, in create
    return self._create_from_population(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/TUE/20174868/miniconda3/envs/thesis_env/lib/python3.11/site-packages/gama/genetic_programming/selection.py", line 22, in create_from_population
    parent_pairs = nsga2_select(pop, n, metrics)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/TUE/20174868/miniconda3/envs/thesis_env/lib/python3.11/site-packages/gama/genetic_programming/nsga2.py", line 49, in nsga2_select
    raise ValueError("population must be at least size 3 for a pair to be selected")
ValueError: population must be at least size 3 for a pair to be selected

Do you have any advice on how to approach this problem? Or is this unintended behaviour?

Note: I am running the code on the fix_warm_start branch, as stated in issue #197

Thanks in advance!

PGijsbers commented 11 months ago

How many individuals do you use to warm start?

WmWessels commented 11 months ago

@PGijsbers I used 16 individuals in this example

PGijsbers commented 11 months ago

So what I think happens here is that an individual gets evaluated, and then there are no queued individuals so it must create a new one. But it cannot because the other processes are still evaluating the rest of warm start. The most appropriate thing to do here might me to modify ‘create_from_population’ to work when fewer than 3 individuals have been evaluated by either generating a random pipeline or (more appropriate for warm start) picking one random individual and using mutation on that. Or select a number of warm start individuals that exceeds the number of cores by a higher factor (but that’s not applicable in your case unless you want to lower the nber of used cores instead).

WmWessels commented 11 months ago

@PGijsbers Thank you for the suggestion, I implemented the solution you proposed and fixed the issue (the suggestion to modify create_from_population)