openml / automlbenchmark

OpenML AutoML Benchmarking Framework
https://openml.github.io/automlbenchmark
MIT License
401 stars 132 forks source link

Defaults to improve framework integration setup #642

Open PGijsbers opened 1 week ago

PGijsbers commented 1 week ago

I started looking at our framework integration requirements (in preparation for external integrations), and the setup is a bit cumbersome. Some proposals:

1. Default for absent setup in __init__.py

A function setup is required to be defined in __init__.py, in every integration this has the same implementation:

def setup(*args, **kwargs):
       call_script_in_same_dir(__file__, "setup.sh", *args, **kwargs)

we could make it the default behavior, removing the need to add this to __init__.py.

2. Exit on error setup.sh

The setup scripts can be finicky and produce a lot of output. The automl benchmark can't (always) check whether an installation is successful. Errors easily get lost in logs. For all these reasons, it can be easy to miss a framework failed to install successfully. Even though there is (almost?) never a reason you should be OK with the setup script failing and continuing to try and run an evaluation. For that reason, many scripts already include a set -e. I think the default should be for the caller to set this mode (integration scripts can always undo it with set +e). Especially useful for people working on new integrations that might overlook this otherwise.

3. Shared setup should be run before framework specificsetup.sh

Currently, it is the burden of the integration script to call the shared installation. The main reason for this is, I think, so they can specify whether or not a virtual environment should be created (and have PY and PIP set accordingly). I think whether or not a virtual environment should be created should become a configuration in the frameworks.yaml file, and default to yes. With the option being put in the configuration, AMLB can now setup the shared environment and then call the framework-specific setup script. Further reducing the boilerplate, and also making it easier since that "boilerplate" still required knowing where your framework integration lived relative to the setup scripts (by sourcing another file).


Interested to hear your thoughts. Do you see any obstacles? Better ways? Additional changes?

(@eddiebergman, @Innixma)

Innixma commented 1 week ago

I think whether or not a virtual environment should be created should become a configuration in the frameworks.yaml file

+1, this would be great to simplify debugging and development. Easy ways to run directly with the calling environment and skip creating a new one will make things lighter when someone wants to run a debugger through the entire logic.

Overall LGTM