Currently, external tools that want to inspect our stack while running wasmfx code, such as gdb, lldb and perf, need to rely on the DWARF information that wasmtime produces. In particular, we have some hand-crafted DWARF directives in crates/runtime/src/fibre/unix/x86_64.rs that encode the parent-child relationship between continuations' stacks.
Unfortunately, perf gets stuck when working on information recorded with perf record --call-graph dwarf, meaning that it is having issues with the DWARF-based backtrace frame information that wasmtime offers.
Luckily, in generated code, wasmtime/cranelift use frame pointers to facilitate stack walking. However, these frame pointer chains are currently broken when crossing continuation stacks: Inside wasmtime_fibre_start, the "launchpad" sitting at the bottom of every fiber stack, it is not the case that the RBP register contains an address where we may load a frame pointer for the parent/caller.
However, I realized that we can actually construct a fully working frame pointer chain by only making a few changes. The technical details are described in the comment at the beginning of unix.rs (featuring ASCII art!). With these changes in place, perf now shows perfect backtraces when invoked with perf record --call-graph fp. This method of recording should also have less overhead than the DWARF-based profiling approach.
While this PR adds a lot of comments and re-organizes some code, the actual changes are small. Let TOS be the top of stack of a continuation, then:
At TOS - 0x10, we no longer store a stack pointer denoting the end of the stack frame where wasmtime_fibre_switch switched to us, but the frame pointer of the that stack frame of where wasmtime_fibre_switch switched to us. The difference between these two is always a constant offset, meaning that we can obtain one from the other.
At TOS - 0x08, we now store a fake return address, which is the address of wasmtime_fibre_switch. Thus, any stack walking tool sees that the "caller" of wasmtime_fibre_start is the wasmtime_fibre_switch in the parent continuation's stack, whose parent is in turn the function that resume-d us.
These changes are basically for free:
In wasmtime_fibre_switch, all we need to do is some arithmetic to translate between frame pointers and stack pointers
I've slightly re-organized wasmtime_fibre_init for clarity, but the only extra work it does is storing a pointer to wasmtime_fibre_switch at TOS - 0x08. The only other change is one extra step of address arithmetic.
wasmtime_fibre_start remains logically unchanged, except for a small change to the .cfi_ directives and we now source the value of TOS from a different register.
Currently, external tools that want to inspect our stack while running wasmfx code, such as
gdb
,lldb
andperf
, need to rely on the DWARF information that wasmtime produces. In particular, we have some hand-crafted DWARF directives incrates/runtime/src/fibre/unix/x86_64.rs
that encode the parent-child relationship between continuations' stacks.Unfortunately,
perf
gets stuck when working on information recorded withperf record --call-graph dwarf
, meaning that it is having issues with the DWARF-based backtrace frame information that wasmtime offers.Luckily, in generated code, wasmtime/cranelift use frame pointers to facilitate stack walking. However, these frame pointer chains are currently broken when crossing continuation stacks: Inside
wasmtime_fibre_start
, the "launchpad" sitting at the bottom of every fiber stack, it is not the case that theRBP
register contains an address where we may load a frame pointer for the parent/caller.However, I realized that we can actually construct a fully working frame pointer chain by only making a few changes. The technical details are described in the comment at the beginning of
unix.rs
(featuring ASCII art!). With these changes in place,perf
now shows perfect backtraces when invoked withperf record --call-graph fp
. This method of recording should also have less overhead than the DWARF-based profiling approach.While this PR adds a lot of comments and re-organizes some code, the actual changes are small. Let
TOS
be the top of stack of a continuation, then:TOS - 0x10
, we no longer store a stack pointer denoting the end of the stack frame wherewasmtime_fibre_switch
switched to us, but the frame pointer of the that stack frame of wherewasmtime_fibre_switch
switched to us. The difference between these two is always a constant offset, meaning that we can obtain one from the other.TOS - 0x08
, we now store a fake return address, which is the address ofwasmtime_fibre_switch
. Thus, any stack walking tool sees that the "caller" ofwasmtime_fibre_start
is thewasmtime_fibre_switch
in the parent continuation's stack, whose parent is in turn the function thatresume
-d us.These changes are basically for free:
wasmtime_fibre_switch
, all we need to do is some arithmetic to translate between frame pointers and stack pointerswasmtime_fibre_init
for clarity, but the only extra work it does is storing a pointer towasmtime_fibre_switch
atTOS - 0x08
. The only other change is one extra step of address arithmetic.wasmtime_fibre_start
remains logically unchanged, except for a small change to the.cfi_
directives and we now source the value of TOS from a different register.