tum-ei-eda / mlonmcu

Tool for the deployment and analysis of TinyML applications on TFLM and MicroTVM backends
Apache License 2.0
28 stars 12 forks source link

Postprocesses very slow due to Python’s GIL #153

Open PhilippvK opened 3 months ago

PhilippvK commented 3 months ago

I realized that only one core is utilized during the POSTPROCESS stage when running computation heavy postprocesses (--post analyze_instructions) even when using the --parallel flag.

This makes sense since we are using a ThreadPoolExecutor to execute several runs in parallel, which works well for I/O bound (including calls to 3rd party sub processes) tasks such as found in the TUNE, BUILD and COMPILE. For compute-bound Python code, we are running into problems due to Python’s Global Interpreter Lock (GIL) which basically only allows one Thread to use the interpreter at any point time to ensure thread safety.

A solution for this limitation ist to use the ProcessPoolExecutor which is forking a new completely independent process instead and therefore not facing the same issue. However there are a few challenges involved with this approach:

PhilippvK commented 3 months ago

Here is a visualization of the problem.

Legend: process_pool, per_stage=1: not supported process_pool, per_stage=0: ~1min thread_pool, per_stage=1: ~15min thread_pool, per_stage=0: ~5min

mlonmcu_ram_cpu_disk