llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.8k stars 11.9k forks source link

\@ in two inline assembly expressions are parsed in isolation; so may lead to the same value, different from GCC #60792

Open andyhhp opened 1 year ago

andyhhp commented 1 year ago

\@ is the macro expansion count, and is supposed to be unique in each expansion inside a translation unit. It is commonly used to generate unique local named labels.

However, Clang IAS doesn't get this quite right. https://godbolt.org/z/reedx7Gsr

Given:

asm ("toplevel:\n\t"
     ".macro M\n\t"
     ".L\\@: add $\\@, %eax\n\t"
     ".endm\n\t");

asm ("M"); // \@ = 0
asm ("M"); // \@ = 1

void a(void)
{
    asm ("M"); // \@ should be 2, actually 0
    asm ("M"); // \@ should be 3, actually 0
}

GCC/Binutils, and Clang without IAS (not surprising - it hands of to GAS) expand M with 4 different numbers. However, Clang IAS does something a bit more weird.

(warning - stdout and stderr are mixed in the following transcript, but it's reasonably easy to follow)

$ clang -S m.c -o /dev/stdout 
    .text
    .file   "m.c"
                                        # Start of file scope inline assembly
toplevel:

.L0:
    addl    $0, %eax

.L1:
    addl    $1, %eax

                                        # End of file scope inline assembly
    .globl  a                       # -- Begin function a
    .p2align    4, 0x90
    .type   a,@function
a:                                      # @a
    .cfi_startproc
# %bb.0:
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset %rbp, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register %rbp
    #APP
<instantiation>:1:1: error: invalid symbol redefinition
.L0: add $0, %eax
^
.L0:
    addl    $0, %eax

    #NO_APP
    #APP
<instantiation>:1:1: error: invalid symbol redefinition
.L0: add $0, %eax
^
.L0:
    addl    $0, %eax

    #NO_APP
    popq    %rbp
    .cfi_def_cfa %rsp, 8
    retq
.Lfunc_end0:
    .size   a, .Lfunc_end0-a
    .cfi_endproc
                                        # -- End function
    .ident  "clang version 10.0.0-4ubuntu1 "
    .section    ".note.GNU-stack","",@progbits
    .addrsig
2 errors generated.

At the top level, it maintains an incrementing expansion count across separate asm(), so the 0 and 1 case expand correctly. However, when we come to the first asm() in a(), the expansion count resets back to 0.

This causes what should be unique local symbols to not be unique, and then suffer error: invalid symbol redefinition.

For completeness, the second asm() in a() also resets back to 0 again, which is why we end up with two duplicate symbol redefinitions.

MaskRay commented 1 year ago

Clang's behavior is different from GCC's but I am unsure it is considered as incorrect. For the concatenated module-level inline asm and each TargetOpcode::INLINEASM, a new MCAsmParser instance is constructed without the previous state. This isolation is sometimes a nice property.

To achieve your goal, use extended asm (not usable outside a function) with https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#AssemblerTemplate

‘%=’
Outputs a number that is unique to each instance of the asm statement in the entire compilation. This option is useful when creating local labels and referring to them multiple times in a single template that generates multiple assembler instructions.

E.g.

#define M ".L%=: add $%=, %%eax\n\t" :::

void a(void)
{
    asm(M);
    asm(M);
    asm(M);
}
andyhhp commented 1 year ago

I'm aware of %= too, but this bug has come up unexpectedly with a speculative security fix for CVE-2022-27672 (went public on Tuesday).

https://github.com/xen-project/xen/commit/63305e5392ec2d17b85e7996a97462744425db80

In our case, it really is a macro coming in at the assembler level, and not a literal interpreted by the asm statement.

https://github.com/xen-project/xen/blob/master/xen/arch/x86/include/asm/spec_ctrl_asm.h#L120-L161

I know that I'm going to have to find some solution to this urgently, but in all seriousness, by resetting the expansion count, you're breaking the major thing that \@ is actually useful for.

jyknight commented 1 year ago

The best way to do this sort of thing is to never use named labels in a macro, but instead use numbered labels, which you refer to with the "b" (backward) or "f" (forward) suffixes. There can be any number of such labels -- references always find the closest one in the given direction. See https://sourceware.org/binutils/docs/as/Symbol-Names.html

That is, you'd have:

asm ("toplevel:\n\t" ".macro M\n\t" "0: add $0b %eax\n\t" ".endm\n\t");

andyhhp commented 1 year ago

The best way to do this sort of thing is to never use named labels in a macro, but instead use numbered labels

No - that prohibited by our coding style for a very good reason.

Numeric labels have no scoping, so if you have nested macros that use the same numbers, the eventual expansion ends up wrong. It creates very subtle bugs.

Some projects (including Linux) work around it by having people pick random numbers for labels and hoping they don't collide. We work around it by using \@ to construct unique named labels, because that's far less likely to collide.

andyhhp commented 1 year ago

Clang's behavior is different from GCC's but I am unsure it is considered as incorrect.

Ok. Can I pose a different question then. Would you consider it an error for __COUNTER__ to reset mid way through a translation unit?

nickdesaulniers commented 1 year ago

For the concatenated module-level inline asm and each TargetOpcode::INLINEASM, a new MCAsmParser instance is constructed without the previous state. This isolation is sometimes a nice property.

I think this came up in the past with a discussion with @jcai19 (I'd have to dig through previous bug reports). I kind of wonder if the current design is intentional or not.

andyhhp commented 1 year ago

Seems the Github robots are ahead of me, but yes - I've found a workaround by muxing \@ and %= together.

But I've found a different parsing bug too, which ultimately prevents the use of any macro parameter whose name begins with a u. Something is clearly wrong here, but so far it's stubbornly refusing to simplify to something I could open a bug report with.

jyknight commented 1 year ago

I think this came up in the past with a discussion with @jcai19 (I'd have to dig through previous bug reports). I kind of wonder if the current design is intentional or not.

The design of resetting most state is intentional. That this counter is not explicitly preserved is an oversight, I think, and could/should indeed be fixed.