This PR implements the lowering of bufferized pipeline operations to standard set of dialects. The lowering itself does not introduce any concurrency but rather relies on the stages to start and wait for concurrent operations.
It starts of by generating a "ramp up", which pastes all stages after each other until the last stage could start executing. It then enters the hot loop which is a single scf.for that executes every pipeline stage after another, albeit with a different IV for each. After the core scf.for a ramp down makes sure that later stages consume the results of the previous stages that were still executed within the scf.for.
The only limitation of the lowering is that it currently requires the loop to execute at least "numStages - 1" many times. Otherwise, UB occurs.
This PR implements the lowering of bufferized
pipeline
operations to standard set of dialects. The lowering itself does not introduce any concurrency but rather relies on the stages to start and wait for concurrent operations.It starts of by generating a "ramp up", which pastes all stages after each other until the last stage could start executing. It then enters the hot loop which is a single
scf.for
that executes every pipeline stage after another, albeit with a different IV for each. After the corescf.for
a ramp down makes sure that later stages consume the results of the previous stages that were still executed within thescf.for
.The only limitation of the lowering is that it currently requires the loop to execute at least "numStages - 1" many times. Otherwise, UB occurs.