Project [fall20] Proposal: [LLVM Nested For Loop Auto-Flattening]

hj424 commented 3 years ago

What will you do? Implement a pass in LLVM that automatically flatten the nested for loop [1] which provides an opportunity for better applying other loop optimizing techniques.

How will you do it? Based on the llvm-pass-Skeleton framework, add the pass to automatically flatten all the nested for loops. Milestone 1: flatten the perfect nested loop with a constant loop boundary Milestone 2: flatten the perfect nested loop with a variable loop boundary Milestone 3: flatten arbitrary nested loop

How will you empirically measure success? Apply this auto-flattening pass and other available loop optimizing techniques [3] like loop vectorization and unrolling. With the help of loop flattening, we can possibly apply a larger vectorization factor and gain a better speedup on the program runtime.

(1) Create simple benchmarks that correspond to the milestones. (2) Pick realistic benchmarks (the ones with nested for loop) from embench [4].

We compare the runtime on the mentioned benchmarks between the one with LLVM auto-vectorization pass [5] and the one with auto-flattening pass + LLVM auto-vectorization. The one with the auto-flattening pass should provide better performance under several specific types of benchmarks.

Team members: [@mention their GitHub usernames, if it's not just you] @hj424 Collaborators are very welcome if you are also interested in this topic!

References: [1] A. Ghuloum, A. Fisher, “Flattening and Parallelizing Irregular, Recurrent Loop Nests”, PPOPP, 1995. [2] Loop optimizations in LLVM: https://llvm.org/devmtg/2018-10/slides/Kruse-LoopTransforms.pdf [3] LLVM code transformation metadata on loop optimization: https://llvm.org/docs/TransformMetadata.html#loop-vectorization-and-interleaving [4] Embench: Open Benchmarks for Embedded Platforms: https://github.com/embench/embench-iot [5] LLVM auto-vectorization in LLVM: https://llvm.org/docs/Vectorizers.html [6] Loop flatten techniques applied in software pipelining (converting loop level parallelism into instruction parallelism): https://people.ece.uw.edu/hauck/publications/LoopFlattening.pdf

sampsyo commented 3 years ago

Awesome; this will be super great!! Looking forward to it.

sampsyo commented 3 years ago

Closed in #222.

sampsyo / cs6120

Project [fall20] Proposal: [LLVM Nested For Loop Auto-Flattening] #222