openml-labs / gama

An automated machine learning tool aimed to facilitate AutoML research.
https://openml-labs.github.io/gama/master/
Apache License 2.0
92 stars 29 forks source link

AttributeError in Ensemble Class when Accessing 'validation_score' #200

Open simonprovost opened 1 year ago

simonprovost commented 1 year ago

Hello @PGijsbers,

I hope all is well with you. While executing ensemble on the solution I am designing thanks to GAMA (ref for newcomers: #191), I ran into a slight issue. Initially, I believed that the issue stemmed from my design, but to make sure, a similar issue arose when I forked the latest version of the GAMA main branch.

Description:

During the execution process of my basic main py file available next, an AttributeError arises when the _str_ function attempts to print the ensemble model after all processes have been completed. The issue seems to occur specifically in the Ensemble class, and persists across different running times.

Steps to Reproduce:

  1. Fork the GAMA project.
  2. Execute the following main.py:

Note: I believe that a few parameters from the Gama classifier instantiation are irrelevant to the error, but this is essentially how I obtained the error so I copy-pasted as-is.

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import log_loss, accuracy_score
from gama import GamaClassifier
from gama.search_methods import RandomSearch, AsynchronousSuccessiveHalving, AsyncEA
from gama.postprocessing.ensemble import Ensemble, EnsemblePostProcessing

if __name__ == '__main__':
    X, y = load_breast_cancer(return_X_y=True)
    X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, random_state=0)

    automl = GamaClassifier(
        max_total_time=300,
        store="all",
        search=RandomSearch(),
        max_memory_mb=10000,
        n_jobs=3,
        post_processing=EnsemblePostProcessing(),
        verbosity=50,
    )
    print("Starting `fit` which will take roughly 3 minutes.")
    automl.fit(X_train, y_train)

    print("AutoML Model Champion:\n", automl.model) # Here it fails!

    label_predictions = automl.predict(X_test)
    probability_predictions = automl.predict_proba(X_test)

    print('accuracy:', accuracy_score(y_test, label_predictions))
    print('log loss:', log_loss(y_test, probability_predictions))
    print('log_loss', automl.score(X_test, y_test))

The above script fails to print the AutoML Model Champion. The error trace received is as follows (to jump to the code, click here):

Note: Line numbers may vary slightly. I inserted some prints statements in order to debug this issue.

Traceback (most recent call last):
  File "/tmp/test_gama_main_branch/main.py", line 24, in <module>
    print("AutoML Model Champion:\n", automl.model)
  File "/tmp/test_gama_main_branch/venv/lib/python3.10/site-packages/gama/postprocessing/ensemble.py", line 380, in __str__
    models = sorted(self._models.values(), key=lambda x: x[0].validation_score)
  File "/tmp/test_gama_main_branch/venv/lib/python3.10/site-packages/gama/postprocessing/ensemble.py", line 142, in get_validation_score
    print(f"Type of x[0].validation_score: {type(x[0].validation_score)}")
AttributeError: 'Evaluation' object has no attribute 'validation_score'

Expected Behavior:

The 'AutoML Model Champion' should be printed without errors.

Additional Context:

I suspect this may be a potential cause, although I am uncertain if this was deliberate. It appears that the code is searching for a validation score. The evaluation's object has access on the other to two type of scores. The individual's fitness (x[0] in the code)? Alternately, it could be searching for the 'Evaluation's score' attribute, which is a tuple of floats and therefore may not be suitable?

Even though the ensembling procedure appears to function perfectly, this issue prevents us from printing the champion model. I would appreciate any insights you may have regarding this issue.

In the meantime, I am certain that you are currently very occupied with your Lab. Therefore, I may be able to provide you with a fast PR. I only require certification of the score's attribute you intended to call within the Evaluation's object.

System

Specifications:

Appreciate your time! Best wishes,