iree-org / iree

A retargetable MLIR-based machine learning compiler and runtime toolkit.
http://iree.dev/
Apache License 2.0
2.85k stars 614 forks source link

Disable `iree-input-demote-f64-to-f32` and `iree-input-demote-i64-to-i32` by default? #15830

Open dcaballe opened 11 months ago

dcaballe commented 11 months ago

It looks like these flags are enabled by default, which is extremely dangerous given the consequent correctness issues and unexpected behavior to the user. Not sure why we have them enable by default and what is missing to disable them so opening this Issue to discuss and track what is missing.

ScottTodd commented 11 months ago

Related discussions on https://github.com/openxla/iree/pull/13966.

SPIR-V/WebGPU/some Metal doesn't support i64 math and will fall back to extremely slow emulation - given the prevalence of i64 inputs we really don't want to send all our users down that path.

Could dig up more links to other discussions. IIRC some i64 types come from frontend frameworks that overuse i64 for indexing operations (where a 64 bit value is not needed for the tensor sizes) instead of using a dedicated index type. The "user" intend in those cases is likely not to use 64 bits, but the "framework" does it anyway, and there can be performance implications throughout the entire program from the choice of data type. Some parts of programs really do need 64 bits (e.g. RNG bit math), but we haven't had a good way to differentiate between load-bearing and not-load-bearing type usage.

stellaraccident commented 11 months ago

I expect that we start 64 bit sanitizing the torch side and moving the precision reduction there (where tensor size and scalar sizes are separate). Then just leave this on for stablehlo until someone asserts a similar thing there.

stellaraccident commented 11 months ago

(ie. Make this something that specific input pipelines do and then fix them individually)

dcaballe commented 11 months ago

frameworks that overuse i64 for indexing

Interesting... Shouldn't we have certain control and understanding of indices when we go into IREE land? Interesting enough, some of the large models we are compiling do need 64-bit indexing.

there can be performance implications throughout the entire program from the choice of data type.

Yeah, the backend compiler should be able to shrink plain index computation to 32 bits for static shapes but that won't happen if we are storin indices in buffers and generating 64-bit vector operations for them.

I tried to enable those flags by default and hit some crashes. Perhaps we can start paving the way...

ScottTodd commented 11 months ago

My intuition (as someone who has mostly been observing the frontends and codegen) is that if we had index types used for indexing operations, we could do some analysis to determine how many bits are needed on a per-op basis. Even with dynamic shapes, we could still define constraints (upper limits, equality between dimensions, etc.):

The current situation is tricky because we have these all-or-nothing flags and a loss of information over the course of exporting/converting/importing/lowering.