google / heir

A compiler for homomorphic encryption
https://heir.dev/
Apache License 2.0
297 stars 44 forks source link

Optimize how much to unroll a loop body vs how much yosys can optimize it #257

Open j2kun opened 10 months ago

j2kun commented 10 months ago

On the side of gate-bootstrapped CGGI, we had a path where we fully unrolled all loops before running a circuit optimizer, and for large IRs that was infeasibly slow. We hacked together a different approach in lib/Conversion/MemrefToArith/ExtractLoopBody.cpp that results in only the inner-most body of a loop being optimized via yosys, and retaining the rest of the loop structure.

There is likely an intermediate approach where we can unroll the loop body to a different factor depending on how much yosys is able to optimize it. A decent initial approach might be greedy: starting from no unrolling to a factor of N—say, based on a fixed time budget—and compare the size of the input circuit to the size of the output circuit (this will be a bit fuzzy because the input circuit is not techmapped, but the output circuit is, but I'm hoping relative sizes will be sufficient to compare), and stop once the incremental benefit of unrolling more is plateauing, decreasing, or the time budget is exhausted.

As an outline of the work required for this:

I'm hoping that, once we incorporate arithmetic circuit optimization, the same pass could be re-used for that and apply to non-CGGI contexts.

j2kun commented 9 months ago

The first step of this is done: optimizing the inner-most body of a loop-nest.

github-actions[bot] commented 7 months ago

This issue has 1 outstanding TODOs:

This comment was autogenerated by todo-backlinks

j2kun commented 7 months ago

Last step: add a helper script that runs the loop unrolling with lots of different unroll factors, and prints the relative statistics.