japaric / cargo-call-stack

Whole program static stack analysis
Apache License 2.0
560 stars 50 forks source link

Unexpected difference in LLVM and cargo-call-stack size for untyped functions #65

Closed lulf closed 2 years ago

lulf commented 2 years ago

I have an application that triggers this assert in cargo-call-stack:

thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `0`,
 right: `8`: BUG: LLVM reported that `OUTLINED_FUNCTION_13` uses 0 bytes of stack but this doesn't match our analysis

originating from this code

                            // in all other cases our results should match

                            assert_eq!(
                                *llvm_stack, stack,
                                "BUG: LLVM reported that `{}` uses {} bytes of stack but \
                                 this doesn't match our analysis",
                                canonical_name, llvm_stack
                            );

Running objdump reveals the instructions for OUTLINED_FUNCTION_13:

0003fe96 <OUTLINED_FUNCTION_13>:
   3fe96: 4d f8 08 ed   str lr, [sp, #-8]!
   3fe9a: 50 46         mov r0, r10
   3fe9c: f0 f7 1a f8   bl  0x2fed4 <core::cell::RefCell$LT$T$GT$::borrow_mut::hd9b9aa1e3adf77aa> @ imm = #-65484
   3fea0: 05 46         mov r5, r0
   3fea2: 04 30         adds    r0, #4
   3fea4: 0e 46         mov r6, r1
   3fea6: ef f7 33 fa   bl  0x2f310 <core::ptr::drop_in_place$LT$core..option..Option$LT$drogue_device..drivers..ble..mesh..config..network..Network$GT$$GT$::h7eb090b9876c8058> @ imm = #-68506
   3feaa: 28 46         mov r0, r5
   3feac: 59 46         mov r1, r11
   3feae: 4f f4 dc 72   mov.w   r2, #440
   3feb2: 5d f8 08 eb   ldr lr, [sp], #8
   3feb6: 01 f0 27 b8   b.w 0x40f08 <__aeabi_memcpy4> @ imm = #4174

And it looks like it is supposed to use 8 bytes of stack.

Is the correct way forward to modify the thumb.rs to catch this sp modification so that the calculated value is correct?

japaric commented 2 years ago

Is the correct way forward to modify the thumb.rs to catch this sp modification so that the calculated value is correct?

yes, I assume the machine code analysis is not handling the sp decrement in that first str instruction

lulf commented 2 years ago

I got this the wrong way around! The sp value is calculated correctly by cargo-call-stack, but the analysis from stack-sizes claims 0 stack usage for all the outlined functions.

Using stack-sizes on the elf directly confirms this. It seems to be the case that all "OUTLINED_FUNCTION" reports a stack size of 0. What's the correct course of action here? Is this information simply missing from the .stack-sizes or should there be a guard in cargo-call-stack to filter outlined function symbols?

japaric commented 2 years ago

stack_sizes::analyze_object produces a map from symbol (function) name to its stack usage. if the map does not contain the OUTLINED_FUNCTION symbols it could be that llvm does not produce stack usage information, or it could be a bug in the stack_sizes crate. if it's the former we could use the information from call-stack's machine code analysis.

if the map does contain the OUTLINED_FUNCTION symbols and reports a stack usage of 0 that would be a bug in llvm. in that case, we could work around the bug by using the heuristic: if it's OUTLINED_FUNCTION* and stack usage is zero then use the info from call-stack's machine code analysis.

is the code that runs into the issue public? if not, could you run nm -CSn on the ELF file and report here the lines that contain the OUTLINED_FUNCTION* symbols? I'm wondering if these OUTLINE_FUNCTIONS are not proper functions but rather labels. In assembly you can write something like:

  .global my_fun
my_fun:
  nop
my_label:
  b my_label

my_fun is a proper function. my_label is a label but both will be symbols reported by nm; each should have different flags in the ELF metadata (e.g. global vs local). I wonder if the OUTLINED_FUNCTION* looks like a label and stack_sizes or call-stack is doing something wrong

lulf commented 2 years ago

The only output with OUTLINED_FUNCTION is this:

0003fe8e 00000024 t OUTLINED_FUNCTION_13

The project is public, you can find it here:

https://github.com/drogue-iot/drogue-device/tree/main/examples/nrf52/microbit/bt-mesh

running cargo call-stack --bin microbit-bt-mesh in that folder should produce the error.

japaric commented 2 years ago

thanks for the link. I had a look and this seems to be a llvm bug. all the OUTLINED_FUNCTIONS_* symbols (or rather their addresses) are included in the .stack_sizes section but they all appear there with a stack usage of 0 even the ones that actually do use the stack.

I couldn't find a bug report and would have a hard time producing a small repro case to submit a bug report so I'll pass on that.

I'll add some workarounds to call-stack to deal with this.

japaric commented 2 years ago

v0.1.11 includes a workaround for this issue