openml / automlbenchmark

OpenML AutoML Benchmarking Framework
https://openml.github.io/automlbenchmark
MIT License
394 stars 130 forks source link

Benchmark multiple AutoML systems at once #169

Open PGijsbers opened 3 years ago

PGijsbers commented 3 years ago

When you want to run multiple AutoML systems, you currently have to call the script twice, e.g.:

python runbenchmark.py TPOT test test 
python runbenchmark.py auto-sklearn test test

It would be convenient to allow all a single command to run

The first option is convenient if you have your own framework definition from which you want to run all frameworks. The latter two are more convenient (and less error prone) if you are just interested in running a subset of the default frameworks.yaml definition.

sebhrusen commented 3 years ago

The more I think about this issue, the less i'm convinced about it.

First, to make this really useful, we need to implement in such a way that all jobs (for all frameworks) are managed by the same queue, otherwise a simple loop like the one in runstable.sh is good enough, and I don't see the need to add complexity inside the app to support this.

for c in ${constraints[*]}; do
    for b in ${benchmarks[*]}; do
        for f in ${frameworks[*]}; do
#            echo "python runbenchmark.py $f $b $c -m $mode -p $parallel $extra_params"
            python runbenchmark.py $f $b $c -m $mode -p $parallel $extra_params
        done
    done
done

The problem with the loop is that each python process must be completed before starting the next one, so we may be waiting for a few trailing jobs before being able to start many ($parallel) new ones... It's also possible to put each of them in the background, but then more difficulties will arise as the parallelism is not under control anymore...

Now, if we decide to support only multiple frameworks, then we get the same issue with multiple benchmarks and multiple constraints as shown in the loop above. Although this would be very practical, this also means adding a lot of complexity in the app to support this...

Will try to figure out how this could be implemented without compromising the existing logic, but I'm afraid this can't be done for v2.

PGijsbers commented 3 years ago

I am not familiar enough with the job queuing to weigh in right now, but I am fine providing (approximate) support through the shell script and moving this out of scope for v2.

sebhrusen commented 3 years ago

I may have found a way to implement it just by adding a layer on top of the various Benchmark instances: idea is to still create one instance of those per framework-benchmark-constraint combination, create all the jobs for all of them without starting any (or obtain them through a generator, let's see...), and then the layer on top will be in charge of merging them and executing them. The advantage of this approach is that it shouldn't require serious changes in current Benchmark implementations where most of the logic is done.

I still can't promise it for v2 (I have a lot of work to do on H2O side), but I think it's the right way to approach this.

PGijsbers commented 3 years ago

Don't fret about pushing this into v2 👍 it's just a nice-to-have