jansel / opentuner

An extensible framework for program autotuning
http://opentuner.org/
MIT License
385 stars 114 forks source link

Optionally use real processes for CPU-bound measurements #81

Open jbosboom opened 8 years ago

jbosboom commented 8 years ago

Currently the measurement driver uses multiprocessing.pool.ThreadPool which does uses Python's fake threads. If run_precompiled is IO-bound (for example, it spawns a process and waits for it to complete), we can get speedup. When run_precompiled is CPU-bound, as is natural when using OpenTuner for non-autotuning search problems, we are often slower than executing serially.

We should provide a command line option to use an actual process pool, and document this behavior.

jbosboom commented 8 years ago

Proposal: --parallelism-mode=serial,io,cpu. serial calls compile/run_precompiled directly, io is the current ThreadPool implementation that allows I/O parallelism, and cpu uses Pool instead for real parallelism.

jbosboom commented 8 years ago

Simply replacing ThreadPool with Pool doesn't work. First, the nested function definition compile_result is not picklable, but this can be fixed by moving it to the top level. Then the measurement interface (self.interface) is passed as an argument, but it contains self.pid_lock = threading.Lock() which is not picklable.