Closed haoxintu closed 3 years ago
Hey Sanjay. Thanks for your time and insightful comments here!
I think I got the answer from your detailed explanation. I am happy to find important (potentially) bugs in compilers to make them more reliable and thank you so much again for spending time to fix them!
Best, Haoxin
Just for curious, may I ask do you know why the source code example can not reproduce the issue but IR code can? Or in other words, does it happen oftentimes that some bugs can only be triggered by IR code?
Hi Haoxin -
Thank you for finding and reporting bugs!
For this example, it takes a rare sequence of unoptimized IR instructions to trigger the bug in instcombine. (It is possible that the regression test that I created for this could be reduced a bit more, but not too much.)
And so that IR sequence would usually be optimized away by other passes or instcombine itself. That's why the bug has been hiding silently in LLVM for a very long time (maybe 10 years!).
I did not check exactly how https://reviews.llvm.org/rG7b0d59da9af4bf4eb made the bug invisible, but we know from Dawid's comment 2 that the bug must be in instcombine, so a patch in another pass could not have fixed or caused the root problem.
So I do not have a good answer to your question about frequency of bugs like this, but there are definitely many cases where a bug in some particular LLVM pass is invisible from C source (Clang) because other optimization passes prevent the problem pattern from being encountered.
Some other researchers/bots are fuzzing specific IR passes or sets of passes, and it yields bugs. The disadvantage of that approach is that the importance may not be as high if people think we can't possibly see the bug from Clang.
It's just not visible with the source example in this report.
Thank you all for your checking and fix!
And hey, Sanjay. Just for curious, may I ask do you know why the source code example can not reproduce the issue but IR code can? Or in other words, does it happen oftentimes that some bugs can only be triggered by IR code? I don't know in what situations that source code may lose information after transforming to IR. I super appreciate it if you can give me any hints. Thanks for your time!
Best wishes, Haoxin
I wasn't thinking about backporting a fix to the 13.0 release since the bug was already hidden in trunk using this example...so I did some cleanup and tried to fix another bug before this one: https://reviews.llvm.org/rGa73973c9d461 https://reviews.llvm.org/rGfbb78668f2ee https://reviews.llvm.org/rG982a15cb3fa0 https://reviews.llvm.org/rGc85f450619f7 https://reviews.llvm.org/rG0d83e7203479
So if we do want to backport a fix, I think we'd need to take all of those to patch cleanly.
I'll mark this as fixed for now.
If someone wants to fix it in 13.0 too, please re-open.
Created attachment 25232 [details] Instcombine stuck reproducer
Thanks! So the bug is still present in trunk. It's just not visible with the source example in this report. I'll take a look at fixing it.
Apparently inscombine stuck on visiting trunc/shl/and instructions, every time reaching 49623fa77a35de343e89ea2d8159ce719473ce71 code path:
IC: Visiting: %sext199 = shl i64 %shl, 24 IC: Visiting: %shl.tr100 = and i64 %sext199, 72057594021150720 IC: Visiting: %sext1 = trunc i64 %shl.tr100 to i32
Can you paste the full IR for that function before it enters instcombine?
The exact commit is 7b0d59da9af4bf4eb8342cac579e42a818ac1ae7. After this commit, I can't reproduce this problem with the given code.
Apparently inscombine stuck on visiting trunc/shl/and instructions, every time reaching 49623fa77a35de343e89ea2d8159ce719473ce71 code path:
IC: Visiting: %sext199 = shl i64 %shl, 24
IC: Visiting: %shl.tr100 = and i64 %sext199, 72057594021150720
IC: Visiting: %sext1 = trunc i64 %shl.tr100 to i32
IC: ADD DEFERRED: %shl.tr100 = and i64 %sext199, 72057594021150720
IC: Mod = %sext1 = trunc i64 %shl.tr100 to i32
New = %sext1 = trunc i64 %sext199 to i32
IC: ADD: %sext1 = trunc i64 %sext199 to i32
IC: ERASE %shl.tr100 = and i64 %sext199, 72057594021150720
IC: ADD DEFERRED: %sext199 = shl i64 %shl, 24
IC: ADD: %sext199 = shl i64 %shl, 24
IC: Visiting: %sext199 = shl i64 %shl, 24
IC: Visiting: %sext1 = trunc i64 %sext199 to i32
IC: ADD DEFERRED: %shl.tr = trunc i64 %shl to i32
IC: Old = %sext1 = trunc i64 %sext199 to i32
New =
Not sure how bad is badref occurrence in this case. Problem can be reproduced on release/12 branch but is not seen on trunk anymore.
Not sure whether this is helpful, but this bug exists after commit 49623fa77a35de343e89ea2d8159ce719473ce71.
Extended Description
Hi all.
The following test program makes clang 12.0.x to trunk version hang on -O2 and above.
$cat small.c
include
int a,b,c; void d(int e) { int8_t f; int16_t g; int32_t i = &a; uint16_t j; int8_t *k = &c; int16_t l = 246; uint64_t m; int8_t n = &k; int64_t o; int16_t p; for (; p;) { int64_t q = o; for (q = 5; q; q += 1) if (k = b) for (j = 3; j; j++) { int8_t r; o = r; } for (; p <= 2; p++) s: l = 1; } g = m = e; uint64_t v; int32_t u = &i; uint64_t t = &v; f = u; f = c = l; v = (g ?: (u = m << n)) == f; for (; i <= 8; f = t) ; goto s; }
$clang -w -O2 -m32 small.c //endless compiling, same as -O3 and -Os
$time clang -c -w -O1 -m32 small.c
real 0m0.059s user 0m0.028s sys 0m0.031s
The clang version I used: clang version 14.0.0 (https://github.com/llvm/llvm-project 022538f2764a255bd2c0da3a247791e764933a93) Target: x86_64-unknown-linux-gnu Thread model: posix InstalledDir: /home/haoxin/haoxin-data/compilers/llvm-project/build/bin Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/8 Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/9 Selected GCC installation: /usr/lib/gcc/x86_64-linux-gnu/9 Candidate multilib: .;@m64 Candidate multilib: 32;@m32 Candidate multilib: x32;@mx32 Selected multilib: .;@m64
Reproduced in Godbolt: https://godbolt.org/z/ndxn1cT91
Thanks, Haoxin