llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
29.37k stars 12.14k forks source link

stack limit support #117799

Open mat-c opened 4 days ago

mat-c commented 4 days ago

Hi,

today llvm (and clang, rustc, ...) do not offer the option to check stack limit on hardware where stack probe is not possible. For example mcu without mmu or hw limit. mpu allow some check, but have some limitation.

gcc have -fstack-limit-register and -fstack-limit-symbol option. This allow to put the stack limit in a register or a global variable.

arm documentation give some example how to implement software stack limit :

arm proposition is to have stack limit value at least 256 bytes above it. This allow for small stack allocation (less than 256) to have a quick check that could use high register by doing direct comparison without arithmetic instruction. The 256 (STACK_GUARD) could be a configure option to limit stack that is reserved.

For small stack allocation (less than STACK_GUARD)

if (stack_pointer < stack_limit)
    abort();

For bigger frame (more than STACK_GUARD) or vla, we need to compute the new stack position

if (stack_pointer - Framesize < stack_limit)
    abort();

See [1] for some arm assembly.

llvm support have already some support for emitting prologue according to stack allocation (split stack, stack probe, ...).

Could it be possible to implement stack limit check the same way ?

[1]

register in thumb1 : 64 bits (2 * 16 + 32) instruction

        CMP     sp, sl
        BHS     no_ovf
        BL      abort
no_ovf

symbol in thumb1 : 128 bits (4 * 16 + 32 + 32) instruction

       LDR r7, = stack_limit_addr
       LDR r7, [r7]
        CMP     sp, r7
        BHS     no_ovf
        BL      abort
no_ovf

in thumbv2, with sub.w instruction we could have a small STACK_GUARD. We can do arithmetic operation and use high register.

if stack allocation fit in 12 bits register version

  add.w, ip, sl, # Framesize
  CMP     sp, ip
...

if stack allocation fit in 12 bits symbol version

  movw ip, :lower16:stack_limit
  movt ip, :upper16:stack_limit
  ldr ip, [ip]
  add.w, ip, ip, # Framesize
  CMP     sp, ip
...
s-barannikov commented 3 days ago

FWIW the options do not work for ARM: https://inbox.sourceware.org/gcc/m3fxdfudnc.fsf@google.com/T/#t They do work for M68k: https://gcc.godbolt.org/z/Y3cnE3v4n

It would be amazing if the options were supported by clang. In my baremetal development I constantly overwrite code / heap with stack. Needless to say it is hard to debug such issues.

mat-c commented 2 days ago

@s-barannikov yes gcc one is buggy. It is half working https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117795

I can do some POC hackig the arm split code on llvm. But for a clean version, help will be need from llvm dev.

PS : example with clang -target thumb-v6a-linux-eabi toto_s.c -fsplit-stack -Os -S -mcpu=cortex-m0 -o totos_m0.s clang -target thumb-v7a-linux-eabi toto_s.c -fsplit-stack -Os -S -mcpu=cortex-m4 -o totos_m4.s

You can see prologue inserted by llvm. For stack only check it can be optimized and make it work for thumb-vxm.

toto_s.c.txt totos_m0.s.txt totos_m4.s.txt