llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
27.18k stars 11.13k forks source link

MCA crashing on armv7-m instruction #97020

Open mkannwischer opened 1 month ago

mkannwischer commented 1 month ago

I have the following piece of Arm Cortex-M4 assembly (or any armv7-m CPU really) in input.asm:

.syntax unified
str.w r1, [r0], #16

If I am running llvm-mca --mcpu=cortex-m4 --march=arm < input.asm, I'm getting:

$ llvm-mca --mcpu=cortex-m4 --march=arm < input.asm
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.      Program arguments: llvm-mca --mcpu=cortex-m4 --march=arm
malloc(): invalid size (unsorted)
Aborted (core dumped)

If I remove the .w it works fine and I am getting the expected result.

.syntax unified
str r1, [r0], #16
$ llvm-mca --mcpu=cortex-m4 --march=arm < input.asm
Iterations:        100
Instructions:      100
Total Cycles:      101
Total uOps:        100

Dispatch Width:    1
uOps Per Cycle:    0.99
IPC:               0.99
Block RThroughput: 1.0

Instruction Info:
[1]: #uOps
[2]: Latency
[3]: RThroughput
[4]: MayLoad
[5]: MayStore
[6]: HasSideEffects (U)

[1]    [2]    [3]    [4]    [5]    [6]    Instructions:
 1      1     1.00           *            str   r1, [r0], #16

Resources:
[0]   - M4Unit

Resource pressure per iteration:
[0]    
1.00   

Resource pressure by instruction:
[0]    Instructions:
1.00   str      r1, [r0], #16

I do understand that the .w is not useful here, as the post-increment str results in the 32-bit T4 encoding anyway, I'd still not expect a crash like this.

I'm running llvm from the arch linux repo: https://archlinux.org/packages/extra/x86_64/llvm/

$ llvm-mca --version
LLVM (http://llvm.org/):
  LLVM version 17.0.6
  Optimized build.
  Default target: x86_64-pc-linux-gnu
  Host CPU: goldmont

  Registered Targets:
    aarch64     - AArch64 (little endian)
    aarch64_32  - AArch64 (little endian ILP32)
    aarch64_be  - AArch64 (big endian)
    amdgcn      - AMD GCN GPUs
    arm         - ARM
    arm64       - ARM64 (little endian)
    arm64_32    - ARM64 (little endian ILP32)
    armeb       - ARM (big endian)
    avr         - Atmel AVR Microcontroller
    bpf         - BPF (host endian)
    bpfeb       - BPF (big endian)
    bpfel       - BPF (little endian)
    hexagon     - Hexagon
    lanai       - Lanai
    loongarch32 - 32-bit LoongArch
    loongarch64 - 64-bit LoongArch
    mips        - MIPS (32-bit big endian)
    mips64      - MIPS (64-bit big endian)
    mips64el    - MIPS (64-bit little endian)
    mipsel      - MIPS (32-bit little endian)
    msp430      - MSP430 [experimental]
    nvptx       - NVIDIA PTX 32-bit
    nvptx64     - NVIDIA PTX 64-bit
    ppc32       - PowerPC 32
    ppc32le     - PowerPC 32 LE
    ppc64       - PowerPC 64
    ppc64le     - PowerPC 64 LE
    r600        - AMD GPUs HD2XXX-HD6XXX
    riscv32     - 32-bit RISC-V
    riscv64     - 64-bit RISC-V
    sparc       - Sparc
    sparcel     - Sparc LE
    sparcv9     - Sparc V9
    systemz     - SystemZ
    thumb       - Thumb
    thumbeb     - Thumb (big endian)
    ve          - VE
    wasm32      - WebAssembly 32-bit
    wasm64      - WebAssembly 64-bit
    x86         - 32-bit X86: Pentium-Pro and above
    x86-64      - 64-bit X86: EM64T and AMD64
    xcore       - XCore

I've also tried compiling llvm from source on a recent commit (c791d86eab13634ec372196977eeac8f3e9f4805), but that also results in a crash

$ llvm-mca --mcpu=cortex-m4 --march=arm < input.asm
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.      Program arguments: llvm-mca --mcpu=cortex-m4 --march=arm
 #0 0x00005af5a4096b3c (llvm-mca+0x698b3c)
 #1 0x00005af5a409428b (llvm-mca+0x69628b)
 #2 0x00007a79a5850ae0 (/usr/lib/libc.so.6+0x3cae0)
 #3 0x00005af5a3f7db76 (llvm-mca+0x57fb76)
 #4 0x00005af5a3f7fe66 (llvm-mca+0x581e66)
 #5 0x00005af5a3f81f3b (llvm-mca+0x583f3b)
 #6 0x00005af5a3f82f39 (llvm-mca+0x584f39)
 #7 0x00005af5a3ac55f3 (llvm-mca+0xc75f3)
 #8 0x00007a79a5839c88 (/usr/lib/libc.so.6+0x25c88)
 #9 0x00007a79a5839d4c __libc_start_main (/usr/lib/libc.so.6+0x25d4c)
#10 0x00005af5a3ad6225 (llvm-mca+0xd8225)
Segmentation fault (core dumped)
llvmbot commented 1 month ago

@llvm/issue-subscribers-backend-arm

Author: Matthias J. Kannwischer (mkannwischer)

I have the following piece of Arm Cortex-M4 assembly (or any armv7-m CPU really) in `input.asm`: ``` .syntax unified str.w r1, [r0], #16 ``` If I am running `llvm-mca --mcpu=cortex-m4 --march=arm < input.asm`, I'm getting: ``` $ llvm-mca --mcpu=cortex-m4 --march=arm < input.asm PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace. Stack dump: 0. Program arguments: llvm-mca --mcpu=cortex-m4 --march=arm malloc(): invalid size (unsorted) Aborted (core dumped) ``` If I remove the `.w` it works fine and I am getting the expected result. ``` .syntax unified str r1, [r0], #16 ``` ``` $ llvm-mca --mcpu=cortex-m4 --march=arm < input.asm Iterations: 100 Instructions: 100 Total Cycles: 101 Total uOps: 100 Dispatch Width: 1 uOps Per Cycle: 0.99 IPC: 0.99 Block RThroughput: 1.0 Instruction Info: [1]: #uOps [2]: Latency [3]: RThroughput [4]: MayLoad [5]: MayStore [6]: HasSideEffects (U) [1] [2] [3] [4] [5] [6] Instructions: 1 1 1.00 * str r1, [r0], #16 Resources: [0] - M4Unit Resource pressure per iteration: [0] 1.00 Resource pressure by instruction: [0] Instructions: 1.00 str r1, [r0], #16 ``` I do understand that the `.w` is not useful here, as the post-increment str results in the 32-bit T4 encoding anyway, I'd still not expect a crash like this. I'm running llvm from the arch linux repo: https://archlinux.org/packages/extra/x86_64/llvm/ ``` $ llvm-mca --version LLVM (http://llvm.org/): LLVM version 17.0.6 Optimized build. Default target: x86_64-pc-linux-gnu Host CPU: goldmont Registered Targets: aarch64 - AArch64 (little endian) aarch64_32 - AArch64 (little endian ILP32) aarch64_be - AArch64 (big endian) amdgcn - AMD GCN GPUs arm - ARM arm64 - ARM64 (little endian) arm64_32 - ARM64 (little endian ILP32) armeb - ARM (big endian) avr - Atmel AVR Microcontroller bpf - BPF (host endian) bpfeb - BPF (big endian) bpfel - BPF (little endian) hexagon - Hexagon lanai - Lanai loongarch32 - 32-bit LoongArch loongarch64 - 64-bit LoongArch mips - MIPS (32-bit big endian) mips64 - MIPS (64-bit big endian) mips64el - MIPS (64-bit little endian) mipsel - MIPS (32-bit little endian) msp430 - MSP430 [experimental] nvptx - NVIDIA PTX 32-bit nvptx64 - NVIDIA PTX 64-bit ppc32 - PowerPC 32 ppc32le - PowerPC 32 LE ppc64 - PowerPC 64 ppc64le - PowerPC 64 LE r600 - AMD GPUs HD2XXX-HD6XXX riscv32 - 32-bit RISC-V riscv64 - 64-bit RISC-V sparc - Sparc sparcel - Sparc LE sparcv9 - Sparc V9 systemz - SystemZ thumb - Thumb thumbeb - Thumb (big endian) ve - VE wasm32 - WebAssembly 32-bit wasm64 - WebAssembly 64-bit x86 - 32-bit X86: Pentium-Pro and above x86-64 - 64-bit X86: EM64T and AMD64 xcore - XCore ``` I've also tried compiling llvm from source on a recent commit (`c791d86eab13634ec372196977eeac8f3e9f4805`), but that also results in a crash ``` $ llvm-mca --mcpu=cortex-m4 --march=arm < input.asm PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace. Stack dump: 0. Program arguments: llvm-mca --mcpu=cortex-m4 --march=arm #0 0x00005af5a4096b3c (llvm-mca+0x698b3c) #1 0x00005af5a409428b (llvm-mca+0x69628b) #2 0x00007a79a5850ae0 (/usr/lib/libc.so.6+0x3cae0) #3 0x00005af5a3f7db76 (llvm-mca+0x57fb76) #4 0x00005af5a3f7fe66 (llvm-mca+0x581e66) #5 0x00005af5a3f81f3b (llvm-mca+0x583f3b) #6 0x00005af5a3f82f39 (llvm-mca+0x584f39) #7 0x00005af5a3ac55f3 (llvm-mca+0xc75f3) #8 0x00007a79a5839c88 (/usr/lib/libc.so.6+0x25c88) #9 0x00007a79a5839d4c __libc_start_main (/usr/lib/libc.so.6+0x25d4c) #10 0x00005af5a3ad6225 (llvm-mca+0xd8225) Segmentation fault (core dumped) ```
davemgreen commented 1 month ago

There is a reproducer here: https://godbolt.org/z/1eb7E3Ecr

It looks like ARMAsmParser::processInstruction alters the instruction, in a way that might not be the same as the non-w version.