llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
27.51k stars 11.3k forks source link

LLVM miscompiles consecutive `half` operations by using too much precision on several backends #97975

Open beetrees opened 1 month ago

beetrees commented 1 month ago

Consider the following IR:

define half @f(half %a, half %b, half %c) {
    %d = fadd half %a, %b
    %e = fadd half %d, %c
    ret half %e
}

On backends without native half support, LLVM generally lowers half by converting to float, performing the desired operation, and then converting back to half. For this to be a valid lowering, a conversion back to f16 must occur after each operation, otherwise the excess precision of float will affect the result. For example 65504.0 + 65504.0 + -65504.0 equals infinity if each operation is done at half precision, but will result in 65504.0 if the intermediate value is not rounded to a half but instead kept as a float. This excess runtime precision can lead to miscompilations similar to #89885.

However, it appears that on several backends LLVM will fail to round the intermediate result of consecutive half operations, leading to these kind of bugs. I'm filing this as a single issue rather than a separate issue for each backend as I believe this is an issue with LLVM's default handling of half operations rather than a problem in any specific backend (for instance, AFAIK the only backend-specific code LoongArch has for handling half was added in #94456).

By inspecting the assembly LLVM emits, I've discovered that at least the following backends appear to be affected (in brackets are a specific target triple that I've checked):

I also noticed that 32-bit ARM has this problem in LLVM 18 but not on the main branch. Looking through the issue tracker, it seems it's likely that this was fixed by #80440.

Related to #97981.

llvmbot commented 1 month ago

@llvm/issue-subscribers-backend-powerpc

Author: None (beetrees)

Consider the following IR: ```llvm define half @f(half %a, half %b, half %c) { %d = fadd half %a, %b %e = fadd half %d, %c ret half %e } ``` On backends without native `half` support, LLVM generally lowers `half` by converting to `float`, performing the desired operation, and then converting back to `half`. For this to be a valid lowering, a conversion back to `f16` must occur after each operation, otherwise the excess precision of `float` will affect the result. For example `65504.0 + 65504.0 + -65504.0` equals infinity if each operation is done at `half` precision, but will result in `65504.0` if the intermediate value is not rounded to a `half` but instead kept as a `float`. This excess runtime precision can lead to miscompilations similar to #89885. However, it appears that on several backends LLVM will fail to round the intermediate result of consecutive `half` operations, leading to these kind of bugs. I'm filing this as a single issue rather than a separate issue for each backend as I believe this is an issue with LLVM's default handling of `half` operations rather than a problem in any specific backend (for instance, AFAIK the only backend-specific code LoongArch has for handling `half` was added in #94456). By inspecting the assembly LLVM emits, I've discovered that at least the following backends appear to be affected (in brackets are a specific target triple that I've checked): - AVR (`avr-unknown-unknown`) - Hexagon (`hexagon-unknown-linux-musl`) - LoongArch (`loongarch64-unknown-linux-gnu`) - M68k (`m68k-unknown-linux-gnu`) - MIPS (`mips64el-unknown-linux-gnuabi64`) - MSP430 (`msp430-none-elf`) - PowerPC (`powerpc64le-unknown-linux-gnu`) - SPARC (`sparc64-unknown-linux-gnu`) - WASM (`wasm32-unknown-wasi`): Already reported in #96437 I also noticed that 32-bit ARM has this problem in LLVM 18 but not on the `main` branch. Looking through the issue tracker, it seems it's likely that this was fixed by #80440.
llvmbot commented 1 month ago

@llvm/issue-subscribers-backend-hexagon

Author: None (beetrees)

Consider the following IR: ```llvm define half @f(half %a, half %b, half %c) { %d = fadd half %a, %b %e = fadd half %d, %c ret half %e } ``` On backends without native `half` support, LLVM generally lowers `half` by converting to `float`, performing the desired operation, and then converting back to `half`. For this to be a valid lowering, a conversion back to `f16` must occur after each operation, otherwise the excess precision of `float` will affect the result. For example `65504.0 + 65504.0 + -65504.0` equals infinity if each operation is done at `half` precision, but will result in `65504.0` if the intermediate value is not rounded to a `half` but instead kept as a `float`. This excess runtime precision can lead to miscompilations similar to #89885. However, it appears that on several backends LLVM will fail to round the intermediate result of consecutive `half` operations, leading to these kind of bugs. I'm filing this as a single issue rather than a separate issue for each backend as I believe this is an issue with LLVM's default handling of `half` operations rather than a problem in any specific backend (for instance, AFAIK the only backend-specific code LoongArch has for handling `half` was added in #94456). By inspecting the assembly LLVM emits, I've discovered that at least the following backends appear to be affected (in brackets are a specific target triple that I've checked): - AVR (`avr-unknown-unknown`) - Hexagon (`hexagon-unknown-linux-musl`) - LoongArch (`loongarch64-unknown-linux-gnu`) - M68k (`m68k-unknown-linux-gnu`) - MIPS (`mips64el-unknown-linux-gnuabi64`) - MSP430 (`msp430-none-elf`) - PowerPC (`powerpc64le-unknown-linux-gnu`) - SPARC (`sparc64-unknown-linux-gnu`) - WASM (`wasm32-unknown-wasi`): Already reported in #96437 I also noticed that 32-bit ARM has this problem in LLVM 18 but not on the `main` branch. Looking through the issue tracker, it seems it's likely that this was fixed by #80440.
llvmbot commented 1 month ago

@llvm/issue-subscribers-backend-mips

Author: None (beetrees)

Consider the following IR: ```llvm define half @f(half %a, half %b, half %c) { %d = fadd half %a, %b %e = fadd half %d, %c ret half %e } ``` On backends without native `half` support, LLVM generally lowers `half` by converting to `float`, performing the desired operation, and then converting back to `half`. For this to be a valid lowering, a conversion back to `f16` must occur after each operation, otherwise the excess precision of `float` will affect the result. For example `65504.0 + 65504.0 + -65504.0` equals infinity if each operation is done at `half` precision, but will result in `65504.0` if the intermediate value is not rounded to a `half` but instead kept as a `float`. This excess runtime precision can lead to miscompilations similar to #89885. However, it appears that on several backends LLVM will fail to round the intermediate result of consecutive `half` operations, leading to these kind of bugs. I'm filing this as a single issue rather than a separate issue for each backend as I believe this is an issue with LLVM's default handling of `half` operations rather than a problem in any specific backend (for instance, AFAIK the only backend-specific code LoongArch has for handling `half` was added in #94456). By inspecting the assembly LLVM emits, I've discovered that at least the following backends appear to be affected (in brackets are a specific target triple that I've checked): - AVR (`avr-unknown-unknown`) - Hexagon (`hexagon-unknown-linux-musl`) - LoongArch (`loongarch64-unknown-linux-gnu`) - M68k (`m68k-unknown-linux-gnu`) - MIPS (`mips64el-unknown-linux-gnuabi64`) - MSP430 (`msp430-none-elf`) - PowerPC (`powerpc64le-unknown-linux-gnu`) - SPARC (`sparc64-unknown-linux-gnu`) - WASM (`wasm32-unknown-wasi`): Already reported in #96437 I also noticed that 32-bit ARM has this problem in LLVM 18 but not on the `main` branch. Looking through the issue tracker, it seems it's likely that this was fixed by #80440.
llvmbot commented 1 month ago

@llvm/issue-subscribers-lld-wasm

Author: None (beetrees)

Consider the following IR: ```llvm define half @f(half %a, half %b, half %c) { %d = fadd half %a, %b %e = fadd half %d, %c ret half %e } ``` On backends without native `half` support, LLVM generally lowers `half` by converting to `float`, performing the desired operation, and then converting back to `half`. For this to be a valid lowering, a conversion back to `f16` must occur after each operation, otherwise the excess precision of `float` will affect the result. For example `65504.0 + 65504.0 + -65504.0` equals infinity if each operation is done at `half` precision, but will result in `65504.0` if the intermediate value is not rounded to a `half` but instead kept as a `float`. This excess runtime precision can lead to miscompilations similar to #89885. However, it appears that on several backends LLVM will fail to round the intermediate result of consecutive `half` operations, leading to these kind of bugs. I'm filing this as a single issue rather than a separate issue for each backend as I believe this is an issue with LLVM's default handling of `half` operations rather than a problem in any specific backend (for instance, AFAIK the only backend-specific code LoongArch has for handling `half` was added in #94456). By inspecting the assembly LLVM emits, I've discovered that at least the following backends appear to be affected (in brackets are a specific target triple that I've checked): - AVR (`avr-unknown-unknown`) - Hexagon (`hexagon-unknown-linux-musl`) - LoongArch (`loongarch64-unknown-linux-gnu`) - M68k (`m68k-unknown-linux-gnu`) - MIPS (`mips64el-unknown-linux-gnuabi64`) - MSP430 (`msp430-none-elf`) - PowerPC (`powerpc64le-unknown-linux-gnu`) - SPARC (`sparc64-unknown-linux-gnu`) - WASM (`wasm32-unknown-wasi`): Already reported in #96437 I also noticed that 32-bit ARM has this problem in LLVM 18 but not on the `main` branch. Looking through the issue tracker, it seems it's likely that this was fixed by #80440.
llvmbot commented 1 month ago

@llvm/issue-subscribers-backend-m68k

Author: None (beetrees)

Consider the following IR: ```llvm define half @f(half %a, half %b, half %c) { %d = fadd half %a, %b %e = fadd half %d, %c ret half %e } ``` On backends without native `half` support, LLVM generally lowers `half` by converting to `float`, performing the desired operation, and then converting back to `half`. For this to be a valid lowering, a conversion back to `f16` must occur after each operation, otherwise the excess precision of `float` will affect the result. For example `65504.0 + 65504.0 + -65504.0` equals infinity if each operation is done at `half` precision, but will result in `65504.0` if the intermediate value is not rounded to a `half` but instead kept as a `float`. This excess runtime precision can lead to miscompilations similar to #89885. However, it appears that on several backends LLVM will fail to round the intermediate result of consecutive `half` operations, leading to these kind of bugs. I'm filing this as a single issue rather than a separate issue for each backend as I believe this is an issue with LLVM's default handling of `half` operations rather than a problem in any specific backend (for instance, AFAIK the only backend-specific code LoongArch has for handling `half` was added in #94456). By inspecting the assembly LLVM emits, I've discovered that at least the following backends appear to be affected (in brackets are a specific target triple that I've checked): - AVR (`avr-unknown-unknown`) - Hexagon (`hexagon-unknown-linux-musl`) - LoongArch (`loongarch64-unknown-linux-gnu`) - M68k (`m68k-unknown-linux-gnu`) - MIPS (`mips64el-unknown-linux-gnuabi64`) - MSP430 (`msp430-none-elf`) - PowerPC (`powerpc64le-unknown-linux-gnu`) - SPARC (`sparc64-unknown-linux-gnu`) - WASM (`wasm32-unknown-wasi`): Already reported in #96437 I also noticed that 32-bit ARM has this problem in LLVM 18 but not on the `main` branch. Looking through the issue tracker, it seems it's likely that this was fixed by #80440.
llvmbot commented 1 month ago

@llvm/issue-subscribers-backend-msp430

Author: None (beetrees)

Consider the following IR: ```llvm define half @f(half %a, half %b, half %c) { %d = fadd half %a, %b %e = fadd half %d, %c ret half %e } ``` On backends without native `half` support, LLVM generally lowers `half` by converting to `float`, performing the desired operation, and then converting back to `half`. For this to be a valid lowering, a conversion back to `f16` must occur after each operation, otherwise the excess precision of `float` will affect the result. For example `65504.0 + 65504.0 + -65504.0` equals infinity if each operation is done at `half` precision, but will result in `65504.0` if the intermediate value is not rounded to a `half` but instead kept as a `float`. This excess runtime precision can lead to miscompilations similar to #89885. However, it appears that on several backends LLVM will fail to round the intermediate result of consecutive `half` operations, leading to these kind of bugs. I'm filing this as a single issue rather than a separate issue for each backend as I believe this is an issue with LLVM's default handling of `half` operations rather than a problem in any specific backend (for instance, AFAIK the only backend-specific code LoongArch has for handling `half` was added in #94456). By inspecting the assembly LLVM emits, I've discovered that at least the following backends appear to be affected (in brackets are a specific target triple that I've checked): - AVR (`avr-unknown-unknown`) - Hexagon (`hexagon-unknown-linux-musl`) - LoongArch (`loongarch64-unknown-linux-gnu`) - M68k (`m68k-unknown-linux-gnu`) - MIPS (`mips64el-unknown-linux-gnuabi64`) - MSP430 (`msp430-none-elf`) - PowerPC (`powerpc64le-unknown-linux-gnu`) - SPARC (`sparc64-unknown-linux-gnu`) - WASM (`wasm32-unknown-wasi`): Already reported in #96437 I also noticed that 32-bit ARM has this problem in LLVM 18 but not on the `main` branch. Looking through the issue tracker, it seems it's likely that this was fixed by #80440.
llvmbot commented 1 month ago

@llvm/issue-subscribers-backend-webassembly

Author: None (beetrees)

Consider the following IR: ```llvm define half @f(half %a, half %b, half %c) { %d = fadd half %a, %b %e = fadd half %d, %c ret half %e } ``` On backends without native `half` support, LLVM generally lowers `half` by converting to `float`, performing the desired operation, and then converting back to `half`. For this to be a valid lowering, a conversion back to `f16` must occur after each operation, otherwise the excess precision of `float` will affect the result. For example `65504.0 + 65504.0 + -65504.0` equals infinity if each operation is done at `half` precision, but will result in `65504.0` if the intermediate value is not rounded to a `half` but instead kept as a `float`. This excess runtime precision can lead to miscompilations similar to #89885. However, it appears that on several backends LLVM will fail to round the intermediate result of consecutive `half` operations, leading to these kind of bugs. I'm filing this as a single issue rather than a separate issue for each backend as I believe this is an issue with LLVM's default handling of `half` operations rather than a problem in any specific backend (for instance, AFAIK the only backend-specific code LoongArch has for handling `half` was added in #94456). By inspecting the assembly LLVM emits, I've discovered that at least the following backends appear to be affected (in brackets are a specific target triple that I've checked): - AVR (`avr-unknown-unknown`) - Hexagon (`hexagon-unknown-linux-musl`) - LoongArch (`loongarch64-unknown-linux-gnu`) - M68k (`m68k-unknown-linux-gnu`) - MIPS (`mips64el-unknown-linux-gnuabi64`) - MSP430 (`msp430-none-elf`) - PowerPC (`powerpc64le-unknown-linux-gnu`) - SPARC (`sparc64-unknown-linux-gnu`) - WASM (`wasm32-unknown-wasi`): Already reported in #96437 I also noticed that 32-bit ARM has this problem in LLVM 18 but not on the `main` branch. Looking through the issue tracker, it seems it's likely that this was fixed by #80440.
beetrees commented 1 month ago

I'm not sure that the https://github.com/llvm/llvm-project/labels/missed-optimization label is correct: not truncating to half after each operation (the current behaviour) is presumably faster than doing the extra work; the issue here is that the current lowering is a miscompilation that violates the semantics of LLVM IR.

vchuravy commented 1 month ago

In Julia we handle this by inserting manual conversion operations: https://github.com/JuliaLang/julia/blob/master/src/llvm-demote-float16.cpp

While I am not a fan of this I think the current state is that this is a "frontend" problem, and not a backend issue.

GCC and Clang only recently introduced -fexcess-precision=16 I believe it was previously argued that the C standard allows for the widening of floating point operations with extended precision (the whole x87 debacle).

https://github.com/llvm/llvm-project/commit/85d049a089d4a6a4a67145429ea5d8e155651138

beetrees commented 1 month ago

Nice workaround! However I think it's fair to say that frontends shouldn't be required to insert custom passes late in the LLVM optimisation pipeline in order for LLVM to not miscompile basic floating point operations. As the comment at the top of the file mentions, it relies on some other optimisation passes (instcombine) not running after it that could undo all its hard work.

LLVM IR requires that the basic floating point operations (fadd etc.) give accurately rounded deteministic results. While the LangRef itself is a bit ambiguous in this regard (see #60942), this determinism is relied upon by LLVM IR optimisations, and backends not following this requirement can lead to miscompilations that segfault or worse (see #89885 for an example using the x87 floating point bugs). What the C standard allows is relevant to C frontends like clang (which will explicitly fpext to float if it wants to use excess precision for _Float16 operations), but LLVM IR has its own independently-specified semantics. In fact, -fexcess-precision=16 relies on these LLVM IR semantics to work.

Given that this appears to have been fixed for 32-bit ARM in #80440, it seems that most of the fix will be changing softPromoteHalfType() to return true for these targets. The documentation for the function even explicitly states that returning false will lead to the miscompilations.

beetrees commented 1 month ago

@vchuravy I also notice the Julia pass also handles bfloat. Out of interest, do you know if there any targets where this is known to be an issue? The LLVM source code (here) appears to indicate that soft promotion is always used for bfloat, and spot checking a bunch of targets in compiler explorer I haven't been able to find any targets where there are excess precision bugs.

vchuravy commented 1 month ago

Out of interest, do you know if there any targets where this is known to be an issue?

@maleadt might remember, I think our experience with Float16 made us not trust LLVM handling on platforms that don't support BFloat16 and so we just extended our support there.

jcranmer-intel commented 1 month ago

GCC and Clang only recently introduced -fexcess-precision=16 I believe it was previously argued that the C standard allows for the widening of floating point operations with extended precision (the whole x87 debacle).

LLVM IR semantics for the floating-point types are that, e.g., fadd half means "do IEEE-754-precise binary16 addition", effectively as if FLT_EVAL_METHOD were 0. Deviations from that model are bugs that should be fixed; for the types like half that can be emulated in larger types without double rounding issues, that's not a difficult aspect; the x87 issue (where x87_fp80 induces double-rounding if you try to emulate double with forced-conversion) is much more difficult to fix, although still acknowledged to be a bug (see https://github.com/llvm/llvm-project/issues/44218).

The emulation of half via float happens during CodeGen legalization, which isn't a topic I'm terribly familiar with. Off-hand, with a little spelunking, it looks like @topperc implemented the correct lowering of half-via-float emulation in https://reviews.llvm.org/D73749 with a per-target opt-in mechanism, and this has been opted into by x86, ARM, AArch64, and RISC-V. This means that the default handling of half emulation appears to be wrong, which is worrying to me (especially as it means it's likely to be wrong for any other necessary smaller-via-larger-float emulation targets). On the other hand, it does suggest that there's a relatively easy path to fixing this.

topperc commented 1 month ago

The emulation of half via float happens during CodeGen legalization, which isn't a topic I'm terribly familiar with. Off-hand, with a little spelunking, it looks like @topperc implemented the correct lowering of half-via-float emulation in https://reviews.llvm.org/D73749 with a per-target opt-in mechanism, and this has been opted into by x86, ARM, AArch64, and RISC-V.

That is indeed that patch. At the time, I couldn't afford to spend the time fixing every target so I didn't try. Thus the opt-in mechanism.

This means that the default handling of half emulation appears to be wrong, which is worrying to me (especially as it means it's likely to be wrong for any other necessary smaller-via-larger-float emulation targets)

I don't think we use the broken TypePromoteFloat handling for anything other than f16. bf16 defaults to TypeSoftPromoteHalf.

Maybe I'll try changing the default for f16 for all targets and see what happens.

maleadt commented 1 month ago

Out of interest, do you know if there any targets where this is known to be an issue?

@maleadt might remember, I think our experience with Float16 made us not trust LLVM handling on platforms that don't support BFloat16 and so we just extended our support there.

Yeah, because of the uncertainty how LLVM would treat these operations we inserted the conversions to make sure the results can always be trusted (see https://github.com/JuliaLang/julia/pull/51470#issuecomment-1739140410 and the comments below).

beetrees commented 2 weeks ago

I have confirmed that the experimental C-SKY and Xtensa backends appear to also have this issue.