Open diggerlin opened 5 months ago
according to 6.2.6.1#5 in C18
The C standard says that char types do not have trap representations.
it should not have "Illegal instruction" here.
I am not sure that the backend change is wrong. This seems to be an issue with the front-end IR codegen.
Presumably, in C, loads of character types need to have the freeze
IR instruction applied.
@AaronBallman, thoughts?
the original code reads an uninitialized stack variable?
the original code reads an uninitialized stack variable?
yes.
Clang currently considers branching on uninitialized values to be undefined behavior. This doesn't quite match what the C standard says in some cases, but for now this is won't fix.
(There are some major changes to uninitialized value handling on the horizon for C++, as well as changes on the horizon for LLVM's undef handling, and that will likely have impact on how clang interprets uninitialized values.)
@AaronBallman, is it agreeable with you to treat this case as erroneous behavior (in line with C++)? Should the C committee be consulted about this?
I think this is undefined behavior per spec despite the array being of character type. Array subscripting is defined as doing (*((E1)+(E2)))
. The value produced by *
would have type char
which then gets converted to int
(via integer promotions or via picking a composite type for ==
), that int
has a non-value representation and reading that value is undefined.
that
int
has a non-value representation and reading that value is undefined
@AaronBallman, in the absence of padding bits in the object representation of a type (as is the case for unsigned char
, signed char
, and plain char
), there are no non-value representations of that type. I don't see how a valid char
value becomes an int
with a non-value representation. Indeed, it is a bit of a category error because non-value representations only exist as object representations (i.e., "in memory" and not in the rvalue space).
that
int
has a non-value representation and reading that value is undefined@AaronBallman, in the absence of padding bits in the object representation of a type (as is the case for
unsigned char
,signed char
, and plainchar
), there are no non-value representations of that type. I don't see how a validchar
value becomes anint
with a non-value representation. Indeed, it is a bit of a category error because non-value representations only exist as object representations (i.e., "in memory" and not in the rvalue space).
Hmmm, I think in practice you are correct, but I'm not seeing wording that says a conversion to a wider type means it won't then get a non-value representation. But you're right about the category error, so I'm probably off-base with that explanation because it heads into malicious reading territory.
Taking another run at it, I see 6.3.2.1p2 says: If the lvalue designates an object of automatic storage duration that could have been declared with the register storage class (never had its address taken), and that object is uninitialized (not declared with an initializer and no assignment to it has been performed prior to use), the behavior is undefined.
And in this case, the object has automatic storage duration but it cannot be declared with the register storage class without inducing UB (p3 says "If the array object has register storage class, the behavior is undefined." so the array to pointer decay would have caused UB, which I take to mean that precludes its use for p2).
So it's possible this actually isn't UB despite being rather erroneous.
I can ask on the WG14 reflectors if you'd like, but in terms of moving forward with this issue, I agree with @nikic that the changes coming for undef
in LLVM mean we probably should not touch this right now.
I can ask on the WG14 reflectors if you'd like
That would be great; thanks!
I asked on the reflectors and the answer is... the behavior is not undefined but it is also not particularly clear.
Notionally, array-to-pointer decay is the same as taking the address, but the standard doesn't actually say that explicitly. So the array object does have its address taken, which means the object could not have been declared with the register
keyword, which means the behavior is not undefined.
the following test has "Illegal instruction (core dumped) " which is a regression by https://reviews.llvm.org/D126962
in the following test case. bash> cat a.c
bash> clang-19 -O2 --target=powerpc64le-unknown-linux-gnu -o a.ll -emit-llvm -S a.c bash> cat a.ll
bash> clang-19 -O2 -o a.o --target=powerpc64le-unknown-linux-gnu -c a.c bash> objdump -D a.o
a.o: file format elf64-powerpcle
Disassembly of section .text:
Disassembly of section .comment:
it get an empty .text section.
if I have b.c
bash> cat b.c
$ clang-19 -O2 -o b.o --target=powerpc64le-unknown-linux-gnu -c b.c $ clang-19 -O2 -o test a.o b.o $ ./test Illegal instruction (core dumped)
it has Illegal instruction.
bash> objdump -D test there is following code in the text section.