Open aehyvari opened 2 years ago
I looked into this a little more, and found a simpler example which crashes on my apple with Apple clang version 14.0.0 (clang-1400.0.29.102)
.
Example [updated, because of a formatting error]:
__UINT_LEAST8_TYPE__ c = '\xc4';
int table[200] = {
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0};
int main(int argc, char** argv) {
(void)argc;
(void)argv;
return table[+c];
}
On Mac OS:
% gcc -Wall -Wextra -Werror -fsanitize=signed-integer-overflow,address,undefined foo.c
% ./a.out
foo.c:27:12: runtime error: index -60 out of bounds for type 'int[200]'
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior foo.c:27:12 in
% gcc --version
Apple clang version 14.0.0 (clang-1400.0.29.102)
Target: arm64-apple-darwin21.6.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin
On Ubuntu:
$ gcc -Wall -Wextra -Werror -fsanitize=signed-integer-overflow,address,undefined foo.c
$ ./a.out
$ gcc --version
gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
The thing that breaks the code in clang-14 seems to be the +
in table[+c]
. If I change the code to be
int main(int argc, char** argv) {
(void)argc;
(void)argv;
return table[c];
}
the error disappears on my mac.
What is the c
variable in this example? Where does it come from?
What is the c variable in this example? Where does it come from?
My bad, I didn't know how to use markdown. The example is fixed now. Anyway, the problem persists.
Hi guys, and sorry to intrude to your discussion,
Is it possible, that expression '+c' casts the value to a signed value? The uint8_t value 0xc4 is a negative value, when cast into int8_t.
Yes that must be the case, our problem is the different behaviour across compilers. It seems that different compilers promote the expression +c
to different types.
I should add to this thread as well that according to what I've tested, this promotion happens in the unexpected way (i.e., +c
is a negative value) specifically the arm
backend of clang
. I haven't seen this happening on clang
with x86_64 backend.
I'm not convinced that this is integer promotion. Quick googling didn't give me an authoritative source of how integer promotion should work for, say, 8-bit unsigned word in ANSI C, but no matter how it works, I'd expect then table[0+c]
to be the same access as table[+c]
. However, on clang
for arm
, table[0+c]
is not an access with a negative index.
Unless this is a problem that will be fixed upstream, the most reasonable path forward seems to be to use a patched version of bison when building with clang for arm64.
https://github.com/aehyvari/bison/commit/6587b467fcb6f4733232e606ffa9d81453e9fdf1
Alternatively, we can write our own parser!
I asked from llvm clang's discord forum, and they suggested that this is a sanitizer bug. I filed the issue here: https://github.com/llvm/llvm-project/issues/58335
Alternatively, we can write our own parser!
That's an option I'd like to avoid for now because it looks like a lot of work to do the syntax checks that a parser generator gives for free.
Oh, so this problem is only seen when compiled with sanitizer? I missed that in the original post! Then it makes sense it is a sanitizer bug. Let's see what is the upstream reaction.
Oh, so this problem is only seen when compiled with sanitizer? I missed that in the original post!
Yes, now it looks like it is a sanitizer problem.
I thought that I had seen this changing the behavior of code compiled without the sanitizer, but after the discussion in discord I can't reproduce the miscompilation I thought I saw earlier. I must have done something wrong before.
The problem is not present in the systems used in CI.
When building (with undefined behaviour analysis) on the system mentioned in the title, syntax errors in the smtlib files trigger the following error:
For example regression test
regression/QF_BV/bar.smt2
triggers the error. This does not seem to be a false alarm, since the code produced without the analysis has a different output compared to the systems we use in CI.The problem seems to be related to how the integer sign is handled in the bison-generated parser code. This could be a compiler-dependent issue, but I have only tested it on MacOS 12.5. The failing code is generated with
bison (GNU Bison) 3.8.2
when compiled withclang++-15.0.0
from homebrew.