wasmfx / wasmfxtime

A fork of wasmtime (a fast and secure runtime for WebAssembly) supporting the WasmFX instruction set
https://wasmfx.dev/
Apache License 2.0
19 stars 1 forks source link

Enable backtrace creation purely based on frame pointer walking #141

Closed frank-emrich closed 8 months ago

frank-emrich commented 8 months ago

Currently, external tools that want to inspect our stack while running wasmfx code, such as gdb, lldb and perf, need to rely on the DWARF information that wasmtime produces. In particular, we have some hand-crafted DWARF directives in crates/runtime/src/fibre/unix/x86_64.rs that encode the parent-child relationship between continuations' stacks.

Unfortunately, perf gets stuck when working on information recorded with perf record --call-graph dwarf, meaning that it is having issues with the DWARF-based backtrace frame information that wasmtime offers.

Luckily, in generated code, wasmtime/cranelift use frame pointers to facilitate stack walking. However, these frame pointer chains are currently broken when crossing continuation stacks: Inside wasmtime_fibre_start, the "launchpad" sitting at the bottom of every fiber stack, it is not the case that the RBP register contains an address where we may load a frame pointer for the parent/caller.

However, I realized that we can actually construct a fully working frame pointer chain by only making a few changes. The technical details are described in the comment at the beginning of unix.rs (featuring ASCII art!). With these changes in place, perf now shows perfect backtraces when invoked with perf record --call-graph fp. This method of recording should also have less overhead than the DWARF-based profiling approach.

While this PR adds a lot of comments and re-organizes some code, the actual changes are small. Let TOS be the top of stack of a continuation, then:

  1. At TOS - 0x10, we no longer store a stack pointer denoting the end of the stack frame where wasmtime_fibre_switch switched to us, but the frame pointer of the that stack frame of where wasmtime_fibre_switch switched to us. The difference between these two is always a constant offset, meaning that we can obtain one from the other.
  2. At TOS - 0x08, we now store a fake return address, which is the address of wasmtime_fibre_switch. Thus, any stack walking tool sees that the "caller" of wasmtime_fibre_start is the wasmtime_fibre_switch in the parent continuation's stack, whose parent is in turn the function that resume-d us.

These changes are basically for free: