Gama re-evaluates already evaluated pipelines

leightonvg commented 1 year ago

I do not know if you were aware already, but it seems as if GAMA re-creates individuals it has already evaluated before. This is also the case after the initial population, when the duplicate is created after another equivalent pipeline was already evaluated.

In the small test I ran, GAMA evaluated 131 individuals without errors. 10 of these were previously-seen.

PGijsbers commented 1 year ago

Thanks for the report, this is not intentional.

leightonvg commented 1 year ago

Perhaps the following is of interest to estimate the impact of the issue. For my own purposes I ran GAMA multiple times, in total evaluating 25916 pipelines, of which 2322 (~9%) were previously-seen-pipeline evaluations.

leightonvg commented 1 year ago

On a similar note, I have a better estimate on the severity of the issue. For my purposes I have a lot of GAMA runs, in total there were 1.277.162 individuals evaluated, across all of the 3 search methods currently faciliated (AsyncEA, RandomSearch and ASHA) on the datasets from OpenML-18CC. These evaluations do not include ASHA evaluations on a subset of a dataset, thus only when subsample = 1.0. Out of the 1.277.162 evaluations, a total of 188.550 seem to be duplicates, or about 14.7%.

simonprovost commented 11 months ago

Hi @leightonvg - Note that I came around your issue by full hazard! Well spotted though.

If you are interested in assisting the team of @PGijsbers with this issue, you could investigate the matter and submit a pull request, I reckon 🫡

I am involved in numerous other pull requests, but this one may be of tremendous interest to many in the future. Although I am not yet one of the primary developers for GAMA, I believe we could approach this issue from two distinct perspectives - If I am not mistaken. I would classify them as narrowed and broad. Say that broad should be performed outside of any search algorithm, thereby preventing any algorithm from having duplicates. However, I cannot recall a current population of evaluated individuals existing outside of an algorithm. I could be incorrect however. In a narrowed context, nonetheless, this duplicated avoidance design could be delegated to the search algorithm / the search algorithm designer. This is primarily due to the fact that the majority of the search algorithm should have an up-to-date status on the currently evaluated population, as this information should be returned at the conclusion of the search - in any event!

Therefore, let us examine a simple example, random search. Random search does update a variable called output. This variable could be subjected to a thin check to ensure that the next random sampled individual has not previously been evaluated. Here is a brief example that I have provided, but I have not had the opportunity to test it as I am currently working on another pull request. However, if @leightonvg is willing to do so, please read next.

I believe that with my proposed solution, we could gain:

Fast duplicate checks: Introduced the already_evaluated_pipelines set to keep track of pipelines that have been evaluated previously. Sets in Python support constant-time O(1) lookups, which makes it highly efficient to detect duplicates. This ensures that the time required to check for a duplicate remains constant, regardless of the number of evaluated pipelines. I have taken this action because in your comments aforementioned, you were showing-up significant number of pipelines.
Attempts Counter: To prevent endless looping in the scenario where it becomes increasingly difficult to discover unique pipelines (or in the worst-case scenario, impossible), I have added a counter attempts that tracks how many attempts were made to generate a unique pipeline. The function raises an error once this number exceeds a threshold (max_attempts, which is set to 100,000 by default). This safeguard prevents the search from entering an infinite cycle. I would say that this is extremely implausible in Random Search. Yet, I reckon rather than a raise we could simply end up the search, but to discuss with @PGijsbers.

I trust that these modifications serve as a solid foundation. It would be essential to conduct experiments to ensure that the modifications do not introduce unintended side effects and function as expected in a variety of situations. If @leightonvg or other contributors are willing to advance this, they can begin with this foundation and make any necessary modifications. Following this, the subsequent phase would be to repeat this process for every search method thus far.

In case you do follow this way and my briefly proposed solution, I would be happy to have my name somewhere as contributor of the pull request 😛 that could be helpful to me if that makes sense.

Code:

def random_search(
        operations: OperatorSet,
        output: List[Individual],
        start_candidates: List[Individual],
        max_evaluations: Optional[int] = None,
        max_attempts: int = 100000
) -> List[Individual]:
    """Perform random search over all possible pipelines without duplicates.

    Parameters
    ----------
    operations: OperatorSet
        An operator set with `evaluate` and `individual` functions.
    output: List[Individual]
        A list which contains the found individuals during search.
    start_candidates: List[Individual]
        A list with candidate individuals to evaluate first.
    max_evaluations: int, optional (default=None)
        If specified, only a maximum of `max_evaluations` individuals are evaluated.
        If None, the algorithm will be run indefinitely.
    max_attempts: int, optional (default=100000)
        Maximum number of attempts to generate a unique individual otherwise raise an error.

    Returns
    -------
    List[Individual]
        All evaluated individuals.
    """
    _check_base_search_hyperparameters(operations, output, start_candidates)

    # Using a set for constant-time lookups, i.e O(1).
    already_evaluated_pipelines = {ind.pipeline for ind in output}

    with AsyncEvaluator() as async_:
        for individual in start_candidates:
            async_.submit(operations.evaluate, individual)

        while (max_evaluations is None) or (len(output) < max_evaluations):
            future = operations.wait_next(async_)
            if future.result is not None:
                output.append(future.result.individual)
                already_evaluated_pipelines.add(future.result.individual.pipeline)

            attempts = 0
            new_individual = operations.individual()
            while new_individual.pipeline in already_evaluated_pipelines:
                if attempts >= max_attempts:
                    raise ValueError("Maximum attempts reached while trying to generate a unique individual.")
                new_individual = operations.individual()
                attempts += 1

            async_.submit(operations.evaluate, new_individual)

    return output

⚠️ This is a proposition for a solution that I have not discussed with @PGijsbers or anyone else from his team, so please focus on the proposed solution's criticism - I just like to help the community ! 💪

openml-labs / gama

Gama re-evaluates already evaluated pipelines #189