Closed jonas-schievink closed 2 years ago
the measurement consists of two steps:
note that the search has to start at the "end" of the stack. in the case of the ARM ISA that would be the lowest address
here's how to make those two steps (hopefully) faster:
first, we should measure how long that takes right now.
the operation is currently done using a probe_rs
API that does a memcpy from the host to the target over USB.
to make step (1) faster try this:
fill_stack
subroutine to the targetto make step (2) faster try this:
search_stack
subroutine to the targetthese two operations can be prototyped outside probe-run
using the probe_rs
library.
these two alternative approaches should be timed before being integrated into probe-run
. if it turns out they are slower then there's no point in integrating them.
more details on loading and executing the program on the target:
the fill_stack
function can be written in Rust but must be cross compiled to the thumbv6m-none-eabi
target so that it also works with Cortex-M0.
after that function is cross compiled it'll become machine code (a bunch of bytes); that's what needs to be loaded to the target.
the function should be written in a way that's self-contained and does not perform any other function call (otherwise executing it becomes tricky)
it's also OK to write the function in assembly -- actually it may be easier to avoid stack usage and function calls that way; as we'll only use the machine code it doesn't matter what the source code is
after that, the question is where to load the subroutine: I would suggest loading it to RAM because that's easier than writing to Flash and that way there's no risk it'll collide with program we want to run on the target. careful here: the subroutine will write to RAM so the subroutine itself must be written somewhere it won't overwrite itself
to run the subroutine it should suffice to set the program counter (PC) register to the start of it and resume the target that would only be the case if the subroutine does not use any stack space; that should be the case for these simple functions but double check the assembly (the Stack Pointer register should NOT be modified)
Reopening because only part of it is fixed so far.
With
--measure-stack
, added in https://github.com/knurling-rs/probe-run/pull/254, we paint the whole area the stack could occupy with a bit pattern, and then read it back to determine the program's stack usage. This can write and read hundreds of KBs of RAM, which takes several seconds, so it would be great to speed this up.One idea for speeding this up was to essentially run
memset
on the MCU, but probe-rs does not seem to expose an API for this (if this is even possible at all, with the vendor-provided on-device algorithms).