llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
29.08k stars 11.99k forks source link

String concatenation in inline assembly not working for .asciz/.string #67233

Closed medhefgo closed 1 year ago

medhefgo commented 1 year ago

All three directives allow string concatenation with GNU as while the llvm assembler rejects the latter two.

At least for .asciz, the GNU as manual is explicit:

Note that multiple string arguments not separated by commas will be concatenated together and only one final zero byte will be stored.

$ cat test.c
asm(".ascii \"a\" \"b\"");
asm(".asciz \"a\" \"b\"");
asm(".string \"a\" \"b\"");

$ clang -c -o /dev/null /tmp/test.c 
<inline asm>:2:12: error: unexpected token
.asciz "a" "b"
           ^
<inline asm>:3:13: error: unexpected token
.string "a" "b"
            ^
2 errors generated.
shafik commented 1 year ago

Seems like this is a know issue see:

https://github.com/llvm/llvm-project/blob/e01df8716a1b2401e8d7bac65d96e0a9ee76f6e4/llvm/lib/MC/MCParser/AsmParser.cpp#L3122-L3123

and also see discussion here: https://reviews.llvm.org/D91460

It looks like gcc and edg accept this: https://godbolt.org/z/TcMYb9Pz3

So we should fix but not sure what the result should be.

CC @nickdesaulniers @MaskRay to see who the correct person to include on what the fix should be and if this is possibly a good first problem.

MaskRay commented 1 year ago

LLVM integrated assembler rejecting .asciz "a" "b" (juxtaposition) (error: unexpected token) is intended.

GNU assembler has a behavior change in Nov 2020. https://sourceware.org/pipermail/binutils/2020-November/114172.html

.data
.asciz "a", "b" "c", "d"

was assembled to 61006200 63006400 and now 61006263 006400.

I think rejecting the behavior is reasonable as nobody relied on the use case in the wild (and dangerous if they do, as GNU assembler has a behavior change not too long ago).