Closed jiegec closed 5 months ago
I cannot reproduce it. Note that AOSC is applying a bunch of patches onto GCC 13 (and maybe Binutils, I've not checked their Binutils repo).
With vanilla GCC 13.2.0 and -O2
:
0000000000000910 <test>:
910: 02ffc063 addi.d $sp, $sp, -16
914: 29c02061 st.d $ra, $sp, 8
918: 44000880 bnez $a0, 8 # 920 <test+0x10>
91c: 60001c05 bgtz $a1, 28 # 938 <test+0x28>
920: 1a000024 pcalau12i $a0, 1
924: 02e5c084 addi.d $a0, $a0, -1680
928: 57fdabff bl -600 # 6d0 <puts@plt>
92c: 28c02061 ld.d $ra, $sp, 8
930: 02c04063 addi.d $sp, $sp, 16
934: 53ffd3ff b -48 # 904 <d>
938: 57ffc3ff bl -64 # 8f8 <c>
93c: 1a000024 pcalau12i $a0, 1
940: 02e5c084 addi.d $a0, $a0, -1680
944: 57fd8fff bl -628 # 6d0 <puts@plt>
948: 28c02061 ld.d $ra, $sp, 8
94c: 02c04063 addi.d $sp, $sp, 16
950: 53ffb7ff b -76 # 904 <d>
With vanilla GCC 13.2.0 and -O2 -mno-explicit-relocs
:
908: 02ffc063 addi.d $sp, $sp, -16
90c: 29c02061 st.d $ra, $sp, 8
910: 44000880 bnez $a0, 8 # 918 <test+0x10>
914: 60001805 bgtz $a1, 24 # 92c <test+0x24>
918: 18000244 pcaddi $a0, 18
91c: 57fdb7ff bl -588 # 6d0 <puts@plt>
920: 28c02061 ld.d $ra, $sp, 8
924: 02c04063 addi.d $sp, $sp, 16
928: 53ffdbff b -40 # 900 <d>
92c: 57ffcfff bl -52 # 8f8 <c>
930: 18000184 pcaddi $a0, 12
934: 57fd9fff bl -612 # 6d0 <puts@plt>
938: 28c02061 ld.d $ra, $sp, 8
93c: 02c04063 addi.d $sp, $sp, 16
940: 53ffc3ff b -64 # 900 <d>
With GCC 14 (r14-8422 + some patches I've mentioned at https://github.com/loongson-community/areweloongyet/issues/16#issuecomment-1916876990):
960: 02ffc063 addi.d $sp, $sp, -16
964: 29c02061 st.d $ra, $sp, 8
968: 44000880 bnez $a0, 8 # 970 <test+0x10>
96c: 60002405 bgtz $a1, 36 # 990 <test+0x30>
970: 18000284 pcaddi $a0, 20
974: 57fd5fff bl -676 # 6d0 <puts@plt>
978: 28c02061 ld.d $ra, $sp, 8
97c: 02c04063 addi.d $sp, $sp, 16
980: 53ffc3ff b -64 # 940 <d>
984: 03400000 nop
988: 03400000 nop
98c: 03400000 nop
990: 57ff93ff bl -112 # 920 <c>
994: 18000164 pcaddi $a0, 11
998: 57fd3bff bl -712 # 6d0 <puts@plt>
99c: 28c02061 ld.d $ra, $sp, 8
9a0: 02c04063 addi.d $sp, $sp, 16
9a4: 53ff9fff b -100 # 940 <d>
Is there an assembly output from AOSC GCC?
Cannot reproduce with almost-up-to-date snapshot of gcc and binutils-2.42 on Gentoo:
$ gcc --version
gcc (Gentoo 14.0.1_pre20240128 p17) 14.0.1 20240128 (experimental)
Copyright (C) 2024 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ ld --version
GNU ld (Gentoo 2.42 p1) 2.42.0
Copyright (C) 2024 Free Software Foundation, Inc.
This program is free software; you may redistribute it under the terms of
the GNU General Public License version 3 or (at your option) a later version.
This program has absolutely no warranty.
$ gcc -O2 issue41.c -o issue41
issue41.c: In function ‘main’:
issue41.c:17:9: warning: ignoring return value of ‘scanf’ declared with attribute ‘warn_unused_result’ [-Wunused-result]
17 | scanf("%d%d", &a, &b);
| ^~~~~~~~~~~~~~~~~~~~~
$ echo 1 2 | ./issue41
hello
in d
$ echo $?
5
$ gcc -O2 issue41.c -c -o issue41.o
issue41.c: In function ‘main’:
issue41.c:17:9: warning: ignoring return value of ‘scanf’ declared with attribute ‘warn_unused_result’ [-Wunused-result]
17 | scanf("%d%d", &a, &b);
| ^~~~~~~~~~~~~~~~~~~~~
$ objdump -dwr issue41.o
issue41.o: file format elf64-loongarch
Disassembly of section .text:
0000000000000000 <.Lla-relax-align>:
0: 03400000 nop 0: R_LARCH_ALIGN .Lla-relax-align+0x5
4: 03400000 nop
8: 03400000 nop
c: 03400000 nop
10: 03400000 nop
14: 03400000 nop
18: 03400000 nop
000000000000001c <c>:
1c: 1a000004 pcalau12i $a0, 0 1c: R_LARCH_PCALA_HI20 .LC0
1c: R_LARCH_RELAX *ABS*
20: 02c00084 addi.d $a0, $a0, 0 20: R_LARCH_PCALA_LO12 .LC0
20: R_LARCH_RELAX *ABS*
24: 50000000 b 0 # 24 <c+0x8> 24: R_LARCH_B26 puts
0000000000000028 <L0^A>:
28: 03400000 nop 28: R_LARCH_ALIGN .Lla-relax-align+0x5
2c: 03400000 nop
30: 03400000 nop
34: 03400000 nop
38: 03400000 nop
3c: 03400000 nop
40: 03400000 nop
0000000000000044 <d>:
44: 1a000004 pcalau12i $a0, 0 44: R_LARCH_PCALA_HI20 .LC1
44: R_LARCH_RELAX *ABS*
48: 02c00084 addi.d $a0, $a0, 0 48: R_LARCH_PCALA_LO12 .LC1
48: R_LARCH_RELAX *ABS*
4c: 50000000 b 0 # 4c <d+0x8> 4c: R_LARCH_B26 puts
0000000000000050 <L0^A>:
50: 03400000 nop 50: R_LARCH_ALIGN .Lla-relax-align+0x5
54: 03400000 nop
58: 03400000 nop
5c: 03400000 nop
60: 03400000 nop
64: 03400000 nop
68: 03400000 nop
000000000000006c <test>:
6c: 02ffc063 addi.d $sp, $sp, -16
70: 29c02061 st.d $ra, $sp, 8
0000000000000074 <L0^A>:
74: 44001480 bnez $a0, 20 # 88 <.L5> 74: R_LARCH_B21 .L5
78: 60003405 bgtz $a1, 52 # ac <L0^A> 78: R_LARCH_B16 .L7
7c: 03400000 nop 7c: R_LARCH_ALIGN .Lla-relax-align+0x4
80: 03400000 nop
84: 03400000 nop
0000000000000088 <.L5>:
88: 1a000004 pcalau12i $a0, 0 88: R_LARCH_PCALA_HI20 .LC2
88: R_LARCH_RELAX *ABS*
8c: 02c00084 addi.d $a0, $a0, 0 8c: R_LARCH_PCALA_LO12 .LC2
8c: R_LARCH_RELAX *ABS*
90: 54000000 bl 0 # 90 <.L5+0x8> 90: R_LARCH_B26 puts
94: 28c02061 ld.d $ra, $sp, 8
0000000000000098 <L0^A>:
98: 02c04063 addi.d $sp, $sp, 16
000000000000009c <L0^A>:
9c: 50000000 b 0 # 9c <L0^A> 9c: R_LARCH_B26 d
a0: 03400000 nop a0: R_LARCH_ALIGN .Lla-relax-align+0x4
a4: 03400000 nop
a8: 03400000 nop
00000000000000ac <L0^A>:
ac: 54000000 bl 0 # ac <L0^A> ac: R_LARCH_B26 c
b0: 1a000004 pcalau12i $a0, 0 b0: R_LARCH_PCALA_HI20 .LC2
b0: R_LARCH_RELAX *ABS*
b4: 02c00084 addi.d $a0, $a0, 0 b4: R_LARCH_PCALA_LO12 .LC2
b4: R_LARCH_RELAX *ABS*
b8: 54000000 bl 0 # b8 <L0^A+0xc> b8: R_LARCH_B26 puts
bc: 28c02061 ld.d $ra, $sp, 8
00000000000000c0 <L0^A>:
c0: 02c04063 addi.d $sp, $sp, 16
c4: 50000000 b 0 # c4 <L0^A+0x4> c4: R_LARCH_B26 d
Disassembly of section .text.startup:
0000000000000000 <main-0x1c>:
0: 03400000 nop 0: R_LARCH_ALIGN .Lla-relax-align+0x5
4: 03400000 nop
8: 03400000 nop
c: 03400000 nop
10: 03400000 nop
14: 03400000 nop
18: 03400000 nop
000000000000001c <main>:
1c: 02ff8063 addi.d $sp, $sp, -32
20: 29c04077 st.d $s0, $sp, 16
0000000000000024 <L0^A>:
24: 1a000017 pcalau12i $s0, 0 24: R_LARCH_GOT_PC_HI20 __stack_chk_guard
24: R_LARCH_RELAX *ABS*
28: 28c002f7 ld.d $s0, $s0, 0 28: R_LARCH_GOT_PC_LO12 __stack_chk_guard
28: R_LARCH_RELAX *ABS*
2c: 260002ec ldptr.d $t0, $s0, 0
30: 02c01066 addi.d $a2, $sp, 4
34: 00150065 move $a1, $sp
38: 1a000004 pcalau12i $a0, 0 38: R_LARCH_PCALA_HI20 .LC3
38: R_LARCH_RELAX *ABS*
3c: 02c00084 addi.d $a0, $a0, 0 3c: R_LARCH_PCALA_LO12 .LC3
3c: R_LARCH_RELAX *ABS*
40: 29c0206c st.d $t0, $sp, 8
44: 29c06061 st.d $ra, $sp, 24
0000000000000048 <L0^A>:
48: 54000000 bl 0 # 48 <L0^A> 48: R_LARCH_B26 __isoc99_scanf
4c: 24000465 ldptr.w $a1, $sp, 4
50: 24000064 ldptr.w $a0, $sp, 0
54: 54000000 bl 0 # 54 <L0^A+0xc> 54: R_LARCH_B26 test
58: 28c0206d ld.d $t1, $sp, 8
5c: 260002ec ldptr.d $t0, $s0, 0
60: 5c0021ac bne $t1, $t0, 32 # 80 <L0^A> 60: R_LARCH_B16 .L11
64: 28c06061 ld.d $ra, $sp, 24
0000000000000068 <L0^A>:
68: 28c04077 ld.d $s0, $sp, 16
6c: 02c08063 addi.d $sp, $sp, 32
0000000000000070 <L0^A>:
70: 4c000020 ret
74: 03400000 nop 74: R_LARCH_ALIGN .Lla-relax-align+0x4
78: 03400000 nop
7c: 03400000 nop
0000000000000080 <L0^A>:
80: 54000000 bl 0 # 80 <L0^A> 80: R_LARCH_B26 __stack_chk_fail
Assembly and compiled object can be found at test.tar.gz
Assembly of test:
test:
.LFB13 = .
.cfi_startproc
addi.d $r3,$r3,-16
.cfi_def_cfa_offset 16
st.d $r1,$r3,8
.cfi_offset 1, -8
bnez $r4,.L5
bgt $r5,$r0,.L13
.align 4,54525952,4
.L5:
la.local $r4,.LC2
bl %plt(puts)
ld.d $r1,$r3,8
.cfi_remember_state
.cfi_restore 1
addi.d $r3,$r3,16
.cfi_def_cfa_offset 0
b d
.align 4,54525952,4
.L13:
.cfi_restore_state
bl c
la.local $r4,.LC2
bl %plt(puts)
ld.d $r1,$r3,8
.cfi_restore 1
addi.d $r3,$r3,16
.cfi_def_cfa_offset 0
b d
.cfi_endproc
.LFE13:
.size test, .-test
.section .rodata.str1.8
.align 3
.LC3:
.ascii "%d%d\000"
$ objdump -dwr test.o
000000000000006c <test>:
6c: 02ffc063 addi.d $sp, $sp, -16
70: 29c02061 st.d $ra, $sp, 8
0000000000000074 <L0^A>:
74: 44000c80 bnez $a0, 12 # 80 <.L5> 74: R_LARCH_B21 .L5
78: 60002005 bgtz $a1, 32 # 98 <L0^A> 78: R_LARCH_B16 .L13
7c: 00000000 .word 0x00000000
0000000000000080 <.L5>:
80: 1a000004 pcalau12i $a0, 0 80: R_LARCH_PCALA_HI20 .LC2
80: R_LARCH_RELAX *ABS*
84: 02c00084 addi.d $a0, $a0, 0 84: R_LARCH_PCALA_LO12 .LC2
84: R_LARCH_RELAX *ABS*
88: 54000000 bl 0 # 88 <.L5+0x8> 88: R_LARCH_B26 puts
8c: 28c02061 ld.d $ra, $sp, 8
0000000000000090 <L0^A>:
90: 02c04063 addi.d $sp, $sp, 16
0000000000000094 <L0^A>:
94: 50000000 b 0 # 94 <L0^A> 94: R_LARCH_B26 d
0000000000000098 <L0^A>:
98: 54000000 bl 0 # 98 <L0^A> 98: R_LARCH_B26 c
9c: 1a000004 pcalau12i $a0, 0 9c: R_LARCH_PCALA_HI20 .LC2
9c: R_LARCH_RELAX *ABS*
a0: 02c00084 addi.d $a0, $a0, 0 a0: R_LARCH_PCALA_LO12 .LC2
a0: R_LARCH_RELAX *ABS*
a4: 54000000 bl 0 # a4 <L0^A+0xc> a4: R_LARCH_B26 puts
a8: 28c02061 ld.d $ra, $sp, 8
00000000000000ac <L0^A>:
ac: 02c04063 addi.d $sp, $sp, 16
b0: 50000000 b 0 # b0 <L0^A+0x4> b0: R_LARCH_B26 d
The problem still persists using binutils master (commit 44c91f530fc9acc3a535953696d49633e5a7e24a)
Minimal reproducer:
.section .text,"ax",@progbits
.align 2
.align 5
.globl a
.type a, @function
a:
.align 4,54525952,4
Disassembly of section .text:
0000000000000000 <.Lla-relax-align>:
0: 03400000 nop 0: R_LARCH_ALIGN .Lla-relax-align+0x5
4: 03400000 nop
8: 03400000 nop
c: 03400000 nop
10: 03400000 nop
14: 03400000 nop
18: 03400000 nop
000000000000001c <a>:
1c: 00000000 .word 0x00000000
I see: in binutils 2.42, for .align 4,54525952,4
, binutils only takes the lowest byte (arg = 0, fill_len = 1) from 54525952
, which is zero:
if (arg >= 0)
fill_len = 1;
else
fill_len = -arg;
if (fill_len <= 1)
{
char fill_char = 0;
fill_char = fill;
do_align (align, &fill_char, fill_len, max);
}
However, in binutils 2.41, arg=-4
and fill_len=4
. The behavior was changed in https://github.com/bminor/binutils-gdb/commit/27a750dd896cfc13f4368e4c8df14e6ea5bb718f.
The number was written by gcc, see https://github.com/gcc-mirror/gcc/commit/b20c7ee066cb7d952fa193972e8bc6362c6e4063.
So the answer is: binutils 2.42 assumes gcc with https://github.com/gcc-mirror/gcc/commit/b20c7ee066cb7d952fa193972e8bc6362c6e4063 applied.
So it now seems making sense to backport b20c7ee066cb7d952fa193972e8bc6362c6e4063 to GCC 12 and 13.
And still Binutils should reject .align 4,54525952,4
if it cannot be supported with -mrelax
instead of silently producing wrong code.
So it now seems making sense to backport b20c7ee066cb7d952fa193972e8bc6362c6e4063 to GCC 12 and 13.
Seems fine to me.
And still Binutils should reject
.align 4,54525952,4
if it cannot be supported with-mrelax
instead of silently producing wrong code.
Given the form of the now-problematic .align
's is very distinguishable, and with unambiguous semantics, can we add detection of such usage to make it behave, while preserving new semantics for other usages? Or is such an approach too hacky in your mind?
So it now seems making sense to backport b20c7ee066cb7d952fa193972e8bc6362c6e4063 to GCC 12 and 13.
Seems fine to me.
And still Binutils should reject
.align 4,54525952,4
if it cannot be supported with-mrelax
instead of silently producing wrong code.Given the form of the now-problematic
.align
's is very distinguishable, and with unambiguous semantics, can we add detection of such usage to make it behave, while preserving new semantics for other usages? Or is such an approach too hacky in your mind?
We may special case the 2nd operand of .align and handle it properly if it's the NOP encoding.
But why not just support 3-operand .align unconditionally (with any 2nd operand)? I don't think it's always nonsense if the user wants to fill the alignment padding with a different instruction: at least a different "do nothing" instruction like or $r4,$r4,$r0
should be fine.
And for things like -falign-functions
and -falign-jumps
we can actually fill break
or non-exist instructions instead of NOP: the padding added by -falign-functions
and -falign-jumps
should be unreachable in any normal execution, so a break
or non-exist instruction in these cases can actually serve as a cheap hardening (reducing ROP gadgets) and maybe a hint to the uarch for branch prediction.
However doing that will need a revision of ABI. Current (2.30) wording indeed only allows NOP:
Alignment statement. If the symbol index is 0, the addend indicates the number of bytes occupied by nop instructions at the relocation offset. The alignment boundary is specified by the addend rounded up to the next power of two. If the symbol index is not 0, the addend indicates the first and third expressions of .align. The lowest 8 bits are used to represent the first expression, other bits are used to represent the third expression.
So it now seems making sense to backport b20c7ee066cb7d952fa193972e8bc6362c6e4063 to GCC 12 and 13.
Seems fine to me.
And still Binutils should reject
.align 4,54525952,4
if it cannot be supported with-mrelax
instead of silently producing wrong code.Given the form of the now-problematic
.align
's is very distinguishable, and with unambiguous semantics, can we add detection of such usage to make it behave, while preserving new semantics for other usages? Or is such an approach too hacky in your mind?We may special case the 2nd operand of .align and handle it properly if it's the NOP encoding.
But why not just support 3-operand .align unconditionally (with any 2nd operand)? I don't think it's always nonsense if the user wants to fill the alignment padding with a different instruction: at least a different "do nothing" instruction like
or $r4,$r4,$r0
should be fine.And for things like
-falign-functions
and-falign-jumps
we can actually fillbreak
or non-exist instructions instead of NOP: the padding added by-falign-functions
and-falign-jumps
should be unreachable in any normal execution, so abreak
or non-exist instruction in these cases can actually serve as a cheap hardening (reducing ROP gadgets) and maybe a hint to the uarch for branch prediction.However doing that will need a revision of ABI. Current (2.30) wording indeed only allows NOP:
Alignment statement. If the symbol index is 0, the addend indicates the number of bytes occupied by nop instructions at the relocation offset. The alignment boundary is specified by the addend rounded up to the next power of two. If the symbol index is not 0, the addend indicates the first and third expressions of .align. The lowest 8 bits are used to represent the first expression, other bits are used to represent the third expression.
Aww. Indeed this would need attention from the Loongson teams if such an improvement is to be made; but in order to additionally stuff a 4-byte second expression into the reloc addend, the expressible range of the third expression must be reduced. I don't think this would create any serious problem though, because I expect large "maximum skip"'s (larger than e.g. 16 bits) to be practically non-existent.
(See GAS doc on .balign
and .p2align
, the padding is at most 4 bytes long; .align
is meant to emulate the host and possibly non-GNU assembler's behavior, and doesn't have its 2nd expr's semantics fully specified, but as LoongArch's canonical toolchain is GNU, we are free to just adopt the GAS standard behavior for our .align
too.)
Given we're little-endian throughout, if we just assume that very large maximum-skip values do not exist, the following change would be backwards-compatible (read: no real-world breakage expected):
-The lowest 8 bits are used to represent the first expression, other bits are used to represent the third expression.
+The lowest 8 bits are used to represent the first expression, the bits 8 to 31 inclusive the third expression, and the higher 32 bits the second expression.
+ If the bitfield for the second expression reads as zero, it is to be treated as a NOP (0x03400000).
This is all assuming ELF64. Elf32_Rela
's addend field is only 32 bits wide, there's no way to stuff a max-32-bit value into the field while also having to make the field contain the other expressions as well. Which means we might have to introduce another reloc record like R_LARCH_ALIGN_PAD
for expressing the second expression after all, but things start looking really fragile, and I definitely need more sleep or I'll most probably design bad ABI that would make me regret for the rest of my life.
No, op2 is not needed to be encoded in the reloc. The instructions are already in the binary section and R_LARCH_ALIGN just delete some (or all) of them.
We just need to make the assembler stuff min((1 << op1) - 1, op3) / 4
instructions and an R_LARCH_ALIGN, and make the linker delete min((pc + (1 << op1) - 1) & ~((1 << op1) - 1), op3) / 4
instructions once we see an R_LARCH_ALIGN.
GCC change backporting proposal: https://gcc.gnu.org/pipermail/gcc-patches/2024-February/645067.html
Compile and link the following code:
Using binutils 2.42 and gcc 13.2.0 from AOSC OS:
Generates invalid
0x00000000
instruction intest
:Which can lead to SIGILL:
CC @xry111