llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
27.8k stars 11.45k forks source link

Regression: large integer division lowers to very slow code #96488

Open mlugg opened 2 months ago

mlugg commented 2 months ago

Downstream issue: https://github.com/ziglang/zig/issues/20340

In the Zig compiler, we have observed incredibly slow compilation and execution of some arithmetic operations on large integers when compiling via LLVM. Zig's self-hosted x86_64 code generation backend, which does not currently apply any optimization passes, is able to produce significantly faster code then our LLVM backend. This issue also reproduces when compiling equivalent code with Clang.

It appears that this regressed in LLVM 16 as a result of 6d13b80 (https://reviews.llvm.org/D130079). Zig 0.10.1, which uses LLVM 15, exhibits much faster compilation and code generation.

This likely contributes to #82196.

Minimal reproduction:

target triple = "x86_64-unknown-linux-gnu"

@large_integer = unnamed_addr global i16384 15132498712934891283498721348971234987123498712349871234987123498712349871234987123498712349999999999999999999999999999999999999999999999999999999999999999900000000000000000000000000000000000000000000000000000000000000000000000000000000012345987123498712349871234987123498712349871234987123498734987234598723459872345987456398734569873459872394859238470000000001151324987129348912834987213489712349871234987123498712349871234987123498712349871234987123499999999999999999999999999999999999999999999999999999999999999999000000000000000000000000000000000000000000000000000000000000000000000000000000000123459871234987123498712349871234987123498712349871234987349872345987234598723459874563987345698734598723948592384700000000011513249871293489128349872134897123498712349871234987123498712349871234987123498712349871234999999999999999999999999999999999999999999999999999999999999999990000000000000000000000000000000000000000000000000000000000000000000000000000000001234598712349871234987123498712349871234987123498712349873498723459872345987234598745639873456987345987239485923847000000000115132498712934891283498721348971234987123498712349871234987123498712349871234987123498712349999999999999999999999999999999999999999999999999999999999999999900000000000000000000000000000000000000000000000000000000000000000000000000000000012345987123498712349871234987123498712349871234987123498734987234598723459872345987456398734569873459872394859238470000000001151324987129348912834987213489712349871234987123498712349871234987123498712349871234987123499999999999999999999999999999999999999999999999999999999999999999000000000000000000000000000000000000000000000000000000000000000000000000000000000123459871234987123498712349871234987123498712349871234987349872345987234598723459874563987323489712349871234987123498712349871234987123498712349871234987234598723459872345987234598723459872549872345987234597234598723459872345987234598723459872345987234598723459872345987234598723459872345987234598723459182734987123498712349871234987123498712349871234987123498712349871234987123498712349871234987123498712349871234987123498712349871234987123498712349871234987123498712349871234987124987123498712349871234987234598723459873465987345987342598723459872345987234598723459875927234598723459872345987234598723459872345987234598723459872345987234598723459872345987234598723459872345987234598734569873469873456987345698734569873469873469873456987435698743569873456987345698374569867439873456934659873469837451513249871293489128349872134897123498712349871234987123498712349871234987123498712349871234999999999999999999999999999999999999999999999999999999999999999990000000000000000000000000000000000000000000000000000000000000000000000000000000001234598712349871234987123498712349871234987123498712349873498723459872345987234598745639873456987345987239485923847000000000115132498712934891283498721348971234987123498712349871234987123498712349871234987123498712349999999999999999999999999999999999999999999999993475989994985849876091380111183754975279343, align 16

define dso_local i32 @main() #0 {
  br label %1
1:
  %2 = phi i64 [ 0, %0 ], [ %5, %1 ]
  %3 = load i16384, ptr @large_integer, align 16
  %4 = udiv i16384 %3, 100
  store i16384 %4, ptr @large_integer, align 16
  %5 = add i64 %2, 1
  %6 = icmp eq i64 %5, 500
  br i1 %6, label %7, label %1
7:
  ret i32 0
}

attributes #0 = { nounwind }

Building this IR on an AMD Ryzen 7 7840HS takes around 1.2s, and executing it takes around 0.5s. Equivalent Zig code compiled using Zig's self-hosted x86_64 code generation backend executes in a negligible amount of time (~4ms), making this naive lowering at least 100x faster.

llvmbot commented 2 months ago

@llvm/issue-subscribers-backend-x86

Author: Matthew Lugg (mlugg)

Downstream issue: https://github.com/ziglang/zig/issues/20340 In the Zig compiler, we have observed incredibly slow compilation and execution of some arithmetic operations on large integers when compiling via LLVM. Zig's self-hosted x86_64 code generation backend, which does not currently apply any optimization passes, is able to produce significantly faster code then our LLVM backend. This issue also reproduces when compiling equivalent code with Clang. It appears that this regressed in LLVM 16 as a result of 6d13b80 (https://reviews.llvm.org/D130079). Zig 0.10.1, which uses LLVM 15, exhibits much faster compilation and code generation. This likely contributes to #82196. Minimal reproduction: ``` target triple = "x86_64-unknown-linux-gnu" @large_integer = unnamed_addr global i16384 15132498712934891283498721348971234987123498712349871234987123498712349871234987123498712349999999999999999999999999999999999999999999999999999999999999999900000000000000000000000000000000000000000000000000000000000000000000000000000000012345987123498712349871234987123498712349871234987123498734987234598723459872345987456398734569873459872394859238470000000001151324987129348912834987213489712349871234987123498712349871234987123498712349871234987123499999999999999999999999999999999999999999999999999999999999999999000000000000000000000000000000000000000000000000000000000000000000000000000000000123459871234987123498712349871234987123498712349871234987349872345987234598723459874563987345698734598723948592384700000000011513249871293489128349872134897123498712349871234987123498712349871234987123498712349871234999999999999999999999999999999999999999999999999999999999999999990000000000000000000000000000000000000000000000000000000000000000000000000000000001234598712349871234987123498712349871234987123498712349873498723459872345987234598745639873456987345987239485923847000000000115132498712934891283498721348971234987123498712349871234987123498712349871234987123498712349999999999999999999999999999999999999999999999999999999999999999900000000000000000000000000000000000000000000000000000000000000000000000000000000012345987123498712349871234987123498712349871234987123498734987234598723459872345987456398734569873459872394859238470000000001151324987129348912834987213489712349871234987123498712349871234987123498712349871234987123499999999999999999999999999999999999999999999999999999999999999999000000000000000000000000000000000000000000000000000000000000000000000000000000000123459871234987123498712349871234987123498712349871234987349872345987234598723459874563987323489712349871234987123498712349871234987123498712349871234987234598723459872345987234598723459872549872345987234597234598723459872345987234598723459872345987234598723459872345987234598723459872345987234598723459182734987123498712349871234987123498712349871234987123498712349871234987123498712349871234987123498712349871234987123498712349871234987123498712349871234987123498712349871234987124987123498712349871234987234598723459873465987345987342598723459872345987234598723459875927234598723459872345987234598723459872345987234598723459872345987234598723459872345987234598723459872345987234598734569873469873456987345698734569873469873469873456987435698743569873456987345698374569867439873456934659873469837451513249871293489128349872134897123498712349871234987123498712349871234987123498712349871234999999999999999999999999999999999999999999999999999999999999999990000000000000000000000000000000000000000000000000000000000000000000000000000000001234598712349871234987123498712349871234987123498712349873498723459872345987234598745639873456987345987239485923847000000000115132498712934891283498721348971234987123498712349871234987123498712349871234987123498712349999999999999999999999999999999999999999999999993475989994985849876091380111183754975279343, align 16 define dso_local i32 @main() #0 { br label %1 1: %2 = phi i64 [ 0, %0 ], [ %5, %1 ] %3 = load i16384, ptr @large_integer, align 16 %4 = udiv i16384 %3, 100 store i16384 %4, ptr @large_integer, align 16 %5 = add i64 %2, 1 %6 = icmp eq i64 %5, 500 br i1 %6, label %7, label %1 7: ret i32 0 } attributes #0 = { nounwind } ``` Building this IR on an AMD Ryzen 7 7840HS takes around 1.2s, and executing it takes around 0.5s. Equivalent Zig code compiled using Zig's self-hosted x86_64 code generation backend executes in a negligible amount of time (~4ms), making this naive lowering at least 100x faster.