Postprocesses very slow due to Python’s GIL

I realized that only one core is utilized during the POSTPROCESS stage when running computation heavy postprocesses (--post analyze_instructions) even when using the --parallel flag.

This makes sense since we are using a ThreadPoolExecutor to execute several runs in parallel, which works well for I/O bound (including calls to 3rd party sub processes) tasks such as found in the TUNE, BUILD and COMPILE. For compute-bound Python code, we are running into problems due to Python’s Global Interpreter Lock (GIL) which basically only allows one Thread to use the interpreter at any point time to ensure thread safety.

A solution for this limitation ist to use the ProcessPoolExecutor which is forking a new completely independent process instead and therefore not facing the same issue. However there are a few challenges involved with this approach:

No shared variables between workers: Everything needs to be passed by arguments
All data passed between MLonMCU and workers needs to be serializable using Pythons pickle feature. This is problematic as some items used by MLonMCU (decorated functions, context locks,…) cannot be pickled but I am working on a solution for this.

tum-ei-eda / mlonmcu

Postprocesses very slow due to Python’s GIL #153