Fix unwinding - Githubissues

Urhengulas commented 1 year ago

This PR "fixes" unwinding, by not relying on the sentinel stack frame anymore, but instead checking if unwinding ends in the Reset function. If unwinding ends in the Reset function, it is considered NOT corrupted; if it ends somewhere else, it is considered corrupted.

Fixes #382

Todo:

❌ ~~Is the Reset symbol always present?~~ -> No, "an unmangled symbol named Reset is even less common than main"
❌ ~~Does any thumb bit need to be cleared?~~ -> no
❌ ~~Test it with an corrupted stack~~ -> not trivially possible from Rust

japaric commented 1 year ago

an unmangled symbol named Reset is even less common than main (which at least is common in C programs). neither symbol is specified in the ABI of the cortex-m-rt crate so relying on either could lead to breakage in the future ("future" being any new patch version, i.e. 0.7.x).

on the other hand, _stack_start is part of the ABI of the cortex-m-rt crate so this idea may work at least for cortex-m-rt v0.7.x but the ABI could change in 0.8.0.

another way to figure out the address of the reset handler is to read the second entry in the vector table. this document states:

On system reset, the vector table is fixed at address 0x00000000.

thought I'm not sure that is specified in the Cortex-M ISA. I know STM32 devices use 0x2000_0000 as the start address of Flash memory so firmware put the vector table at that address, instead of at 0x0000_0000. It could be that the Flash memory at 0x2000_0000 is aliased at address 0x0000_0000 but I have not checked.

yet another way to get the address of the reset handler is to issue a reset-halt and read out the program counter (PC) register. I don't have a link to normative documentation but the Cortex-M ISA states that the program counter is set to the second entry of the vector table (the address of the reset handler) after a system reset. I think this runtime approach would be the sure way to get the address. after a reset-halt you could read the SP register and that would give you _stack_start as well.

Urhengulas commented 1 year ago

Thank you for the answer @japaric. A few clarification questions:

Are you suggesting doing both the _stack_start idea James and checking if we end up in the reset handler? Or are you suggesting that we do one or the other?
Obtaining the reset handler during runtime sounds good. But as far as I understand it only gives me the address of the reset handler, not the size. In order to check if we end up inside of it I need both (as far as I understand). How can I get the size?

🥂

Urhengulas commented 1 year ago

I've switched to James proposal of ending the unwinding depending on the _stack_start and that seems to work as well (see https://github.com/knurling-rs/probe-run/pull/383/commits/3e7fe2d7d693baff4f6597b4a245a164c6566289).

Atm I extract the stack start by reading the SP after a reset (as you've proposed). It seems to me that _stack_start is the same as the start of the active_ram_region we extract from the ELF. Is this just a coincidence, or always the case? If they are same we can just do one and not both.

japaric commented 1 year ago

Are you suggesting doing both the _stack_start idea James and checking if we end up in the reset handler? Or are you suggesting that we do one or the other?

I think it'd be good to check the stack pointer (SP) value against _stack_start during each step of the unwinding process. if the SP value ever goes higher than _stack_start then either the stack is corrupted or there's a bug in the unwinding logic; in either case we should stop the unwinding process.

Thinking more about it, I think it may be better not to use _stack_start == SP as the stop condition. reason being, the reset handler could adjust the stack pointer (e.g. to make it multiple of 4 or 8 bytes so that the next stack frame adheres to e.g. the AAPCS ABI) before jumping into the next function. the reset handler itself usually does not follow any ABI (as it's written in assembly) and could lack both a function prologue and enough CFI (call frame information) to let the unwinder compute the value of SP at the start of the reset handler so it may not be possible (for the unwinder) to compute that SP was equal to _stack_start at some point in time.

But as far as I understand it only gives me the address of the reset handler, not the size. In order to check if we end up inside of it I need both (as far as I understand)

correct.

. How can I get the size?

you should be able to iterate the symbol table to find the size (the API may be called symbols or symbol_table depending on whether you are using the xmas_elf or object library). each symbol entry has an associated start address and size (+) so you can compute the address range each symbol spans (start .. (start+size)). you can then use reset_handler_address_span.contains(current_pc) as the stop condition.

(+) should look bit like nm -CSn if you print the data

$ # first column is start address; second is size; numbers are in hexadecimal format
$ arm-none-eabi-nm -CSn some-elf
(..)
00000100 00000028 T Reset
(..)
0000013c 0000000a T main
00000148 00000028 t hello::__cortex_m_rt_main
00000170 0000000a T app::exit

you asked earlier:

Does any thumb bit need to be cleared?

you can pretty much always clear the thumb bit (bit 0) when comparing addresses. the start address of a function (as seen in the objdump output) must always be an even number; the value of PC will always be an even number.

values that eventually end up in an instructions like bl r0 ("branch into function whose address is stored in register R0") have to have the thumb bit set. as those values (effectively function pointer values) are stored in static variables, they are usually stored with the bit already set to avoid setting it at runtime. when you read the second entry of the vector table (either at runtime or from the rodata stored in the ELF) you'll get an odd number.

Urhengulas commented 1 year ago

@japaric said:

Thinking more about it, I think it may be better not to use _stack_start == SP as the stop condition.

What then? That unwinding ends in the reset handler? This is what I implemented in 9f40539e2d537c49ea22eb1ef7cae5f6d50b4fd6.

If this is what you mean we would have:

a successful stop if unwinding ends (!cfa_changed && !program_counter_changed) in the reset handler
unsuccessful stop if
- unwinding stops outside of reset handler
- SP > _stack_start

Correct?

@japaric said:

@Urhengulas said:

How can I get the size?

you should be able to iterate the symbol table to find the size

Didn't you say earlier that we cannot rely on the symbol table? I talking about following quote.

@japaric said:

an unmangled symbol named Reset is even less common than main (which at least is common in C programs). neither symbol is specified in the ABI of the cortex-m-rt crate

Or am I misunderstanding something? Can we figure out the mangled name of the symbol?

Urhengulas commented 1 year ago

@japaric The PR is ready for review again. I think it makes the code oven messier than it already is, and I am working on a follow-up PR to clean things up. But I wouldn't do this in here.

Urhengulas commented 1 year ago

bors r=japaric

bors[bot] commented 1 year ago

Build succeeded:

ci

knurling-rs / probe-run

Fix unwinding #383