Add guardrails around performance cliffs

ScottTodd commented 1 year ago

We've had a few issues where compile time or runtime latency falls off a "performance cliff" (certain ranges of values or program structures are within typical ranges, but going slightly outside of those ranges results in a several orders of magnitude drop in performance). Several of these are related to vectorization.

I'm wondering if we could add some sort of guardrails such that performance cliffs are more prominent, less severe, or (ideally) non-existant.

Brainstorming some ideas:

Add tests/tracking (with alerts) for when program size, compile time, or runtime increases beyond a threshold (we already do this for some workloads, but could do more)
Make it a compiler error (or warning) when a module contains > 10k ops/instructions (or some other threshold)
Report op/instruction counts in Tracy and/or MLIR pass statistics so developers triaging regressions have more data to work with
For passes that perform tiling/vectorization, add more tests and logic for edge cases
- Maybe some pattern/recipe could apply across backends for this? E.g. handle powers of two, aligned values, unaligned values, values > 1000, etc.

jpienaar commented 1 year ago

I like the last 2 quite a lot. I think the 2nd to last is probably the simplest to get exact bisection. But the last I like to have something like "if instructions grow more than 2x then warn" and "if instructions grow more than 100x then fail" (or some such, and can made tighter later).

ScottTodd commented 1 year ago

Some other ideas from another discussion:

a limit on excessive code generation before doing it (i.e. before a multi-hour compile time cliff) and a flag to enforce that peak memory usage is below some limit. Both of these should be hard compiler failures that can be validated quickly, not requiring large tooling or real hardware loops.

allieculp commented 1 year ago

@mattwalsh @stellaraccident fyi in case you want to add to the conversation here.

mattwalsh commented 1 year ago

I support this and suggest we start with this behind a flag that we would enable in CI to know we've broken something...else seems hard to curate what constitutes "too much" for users, even as some of these thresholds seem ridiculous. The famous The compiler is unable to type-check this expression in reasonable time; try breaking up the expression into distinct sub-expressions comes to mind.

I absolutely love the idea of halt-and-catch fire when we devolve to scalar code, also gated behind a flag devs and CI would have on vs. relying on people doing e2e performance characterizations.

ScottTodd commented 4 months ago

I had one idea for how to spot when we fall off a cliff, at least: we could add a PassInstrumentation that counts the number of ops in the module and asserts if some threshold is passed. When running with --mlir-print-ir-after-all, that would let us stop right when the threshold is passed, rather than require some extra backtracking.

iree-org / iree

Add guardrails around performance cliffs #13207