Parallelize transforms - Githubissues

tum-ei-eda / seal5

Seal5 - Semi-automated LLVM Support for RISC-V Extensions including Autovectorization

https://tum-ei-eda.github.io/seal5/

Apache License 2.0

12 stars 6 forks source link

Parallelize transforms #52

Open PhilippvK opened 7 months ago

PhilippvK commented 7 months ago

We should be able to speed up the flow (transforms/backends) by introducing parallelism.

Possible levels:

Per Model (CoreDSL File)
Per Set
Per Instr

Since we currently handle transforms via Python subprocesses, it should be trivial to parallelize those.

thomasgoodfellow commented 7 months ago

Most of the flow processing time goes on git operations (times in seconds, from se-henri server at DLR running 4 CPUs, piping demo.py through "ts" util)

phase	seconds
loading core_desc	9
applying seal5 patches	9
transforming	14
generating instruction patches	7
applying instruction patches	64

The git python module used to apply the patches forks a lot of git processes (hundreds), which is where most of the 64 seconds is consumed. Possible remedies:

Aggregate the patches into a single commit, rather than the commit-per-instruction approach. Having lots of commits can be good for debugging (bisecting) and cherry-picking but in a real "product" might want to squash them into a single "Add FOO extension" anyway?
Write the patches as multiple commits for the instructions into a single mail patch, then apply that with "git am" (whereas current approach is "git apply + add + commit" with each instruction patch

PhilippvK commented 7 months ago

Following up after our discussion...

Tasks:

[ ] Make splitting patches per set/instructions will be optional (only used for debugging)
[ ] Squash all model-specific commits into a single one in DEPLOY stage (Will not reduce runtimes, but help to clean the final git history)
[x] Show timestamps of stages in CI workflow
[x] Add a "profiling" feature to seal5, allowing to track the time (total/min/max/avg) spent on a per stage/transform/model/set/instr as metrics
[x] Add high-level per-model parallelism to Pass API (subprocesses)
[ ] Add low-level per-transform parallelism

PhilippvK commented 7 months ago

BTW, you can find the full log output (incuding timestamps and all verbosity levels, and output of previous runs) in /tmp/seal5_llvm_demo/.seal5/logs/seal5.log

PhilippvK commented 6 months ago

I added instruction level parallelism to the behav_to_pat pass, which is the most time-critical transform, cutting down the runtime (for XCoreVSimd) from 80s to 10s on a 18C/36T CPU.

PhilippvK commented 6 months ago

This feature will be ported to other transforms as well when we have a more stable transforms-API which should help to reduce duplicated codes