Open jbosboom opened 8 years ago
Proposal: --parallelism-mode=serial,io,cpu
. serial
calls compile/run_precompiled directly, io
is the current ThreadPool
implementation that allows I/O parallelism, and cpu
uses Pool
instead for real parallelism.
Simply replacing ThreadPool with Pool doesn't work. First, the nested function definition compile_result
is not picklable, but this can be fixed by moving it to the top level. Then the measurement interface (self.interface
) is passed as an argument, but it contains self.pid_lock = threading.Lock()
which is not picklable.
Currently the measurement driver uses
multiprocessing.pool.ThreadPool
which does uses Python's fake threads. Ifrun_precompiled
is IO-bound (for example, it spawns a process and waits for it to complete), we can get speedup. Whenrun_precompiled
is CPU-bound, as is natural when using OpenTuner for non-autotuning search problems, we are often slower than executing serially.We should provide a command line option to use an actual process pool, and document this behavior.