jansel / opentuner

An extensible framework for program autotuning
http://opentuner.org/
MIT License
382 stars 112 forks source link

Multi-node autotuning #112

Open uphoffc opened 6 years ago

uphoffc commented 6 years ago

Hi,

I'm currently considering to replace our custom auto-tuning implementation with OpenTuner. I was wondering if it is possible (or should be possible) to run OpenTuner simultaneously on multiple compute nodes, where all nodes share the same database (i.e. the database lies on a shared GPFS or Lustre file system). While there is of course the issue that nodes may run at slightly different speeds, the resulting parallelism leads to a faster search space exploration. Do you have any experience with this or do you know of any technical limitation that would prohibit this?

Best regards, Carsten

jbosboom commented 6 years ago

You shouldn't have any problem running multiple independent tuning runs on multiple machines. At least, when I've done it, I automatically end up with one database file per machine, so there's no contention. But they won't share information.

If instead you want to run a single tuning run that runs trials on multiple machines, you should run OpenTuner on some other machine (doesn't have to be powerful because it won't be used for trials). Then implement compile to, besides doing whatever compilation you need for the configuration being tested, submit the job to the cluster, wait, and record the job time/fitness as the compilation result. Then your run_precompiled implementation just returns the fitness.

This is a hack around OpenTuner's assumption that compilation/preparation can run in parallel, but trials must run serially to avoid disturbing each other. If you pass --batch-size N, OpenTuner will "compile" N trials in parallel then run them sequentially before asking the search techniques for more configurations. The mario example uses a similar hack: trials run in separate, single-threaded emulator instances, so we can run as many of them in parallel as we have cores; we don't try to use multiple machines, but we could. (This is using Python threads so compile doesn't actually execute in parallel, but if you're launching and waiting for external processes, those processes can execute in parallel.)

uphoffc commented 6 years ago

Thanks for the quick and detailed answer.

tdeneau commented 5 years ago

Should --batch-size above be --parallelism ?