tetratelabs / wazero

wazero: the zero dependency WebAssembly runtime for Go developers
https://wazero.io
Apache License 2.0
4.89k stars 256 forks source link

Proposal: add guest CPU profiling #1350

Closed Pryz closed 1 year ago

Pryz commented 1 year ago

Profiling has been a very popular tool for many engineers to understand their code, work on performance issues and much more. CPU and Memory profiling are two critical features which are today required to run things in production. For the current proposal, I’m going to focus on CPU profiling but some interesting work can be done on the Memory side as well.

Implementation

We have multiple options to implement CPU profiling in Wazero. Depending on if we want to provide a profiling mechanism from within Wazero or if we are ok just using an external system. In the second case, it’s all about providing the debugging symbols.

Option 1: perf map

When using perf to profile a program in Linux, perf maps are the easiest way to provide symbols for the resolution. The format is quite simple START SIZE symbolname and perf will automagically discover and use them if the file is written to /tmp/perf-$pid.map.

I’ve created a quick draft https://github.com/tetratelabs/wazero/pull/1349 with a potential implementation. This change is hooked to the engine to record the different symbols at compile time. Two caveats with this change: I’m using a dependency to demangle the symbols and it will need to be adapted once we use a single mmap to store the compiled code in memory instead of one per function.

Here is what the resolution of Rust symbols looks like:

Screenshot 2023-04-09 at 9 35 14 PM

Option 2: jitdump+DWARF

Another approach is to create a jitdump file which is a similar approach than perfmap but contains more information and requires more steps to be used (see wasmtime profiling). jitdump are also a bit more annoying to use. During the perf inject phase, perf will generate one *.so file per function which might be a lot depending on the type of program you run in the guest.

Note that some profilers out there, such as Parca, already support reading symbols from jitdump files.

It is also possible to enhance the jitdump data with DWARF (if present in the WASM binary) which we already parse in Wazero to create stacktraces during panic events.

Option 3: Profiler within Wazero

Last but not least: add a profiler directly into Wazero. So this one is obviously the most complex. Idea would be to add a custom pprof endpoint, leverage the DWARF data and stack unwinding to implement a customer profiler within Wazero. The runtime has all the information it needs to do this work and such feature could be really powerful. Wazero would become the only runtime out there which can profile any guest module and allow to use go tool pprof or any compatible tool to consume to data.

I also tried to use the built-in Go pprof but this is what we currently see:

Screen Shot 2023-02-20 at 17 09 25

The go runtime doesn’t have enough awareness about the WASM module to include it into the profiles. There might be a way to traceback the WASM stack similar to github.com/ianlancetaylor/cgosymbolizer but my attempts failed so far.

Thoughts? :)

mathetake commented 1 year ago

I didn't know that there's something like perf map. Thank you for letting me know about that, and that the option 1 seems the way to go.

But instead of using def := module.FunctionDefinitionSection[funcIndex+importedFuncs] which requires demangling for meaningful info from C/C++/Rust binaries as you've demonstrated in 1349, we should be able to retrieve the unmangled name from DWARF info. Then, we won't need to demangle by ourselves or introduce dependency (which I definitely would not do).

Pryz commented 1 year ago

Ha yes didn't try to get the function name from the DWARF data. I will update the change! thanks :)

mathetake commented 1 year ago

oh yeah but I once did the experiments to get function symbols from DWARF sections, and I couldn't find the proper way to do so. Maybe it might be impossible with debug/dwarf and we might end up implementing dwarf interpreter by our selves :D

Pryz commented 1 year ago

Looking at the DWARF data, looks like we are gonna have to either re-interpret the WASM DWARF data ourselves or somehow walk to debug/dwarf data to reconstruct the names:

Here we need to map those two entries (not sure how) to build fib::fibdo:

2873 Namespace true
field:  Name ClassString fib
field: attr=Name class=ClassString val=fib
...
2955 Subprogram true
field:  Lowpc ClassAddress (0x104401aa0,0x14000ed2e10)
field: attr=Lowpc class=ClassAddress val=2222
field:  Highpc ClassConstant (0x104401160,0x14000ed2e18)
field: attr=Highpc class=ClassConstant val=391
field:  FrameBase ClassExprLoc (0x1043fe5e0,0x14000eac450)
field: attr=FrameBase class=ClassExprLoc
field:  LinkageName ClassString _ZN3fib5fibdo17h1d154f85f7b42935E
field: attr=LinkageName class=ClassString val=_ZN3fib5fibdo17h1d154f85f7b42935E
field:  Name ClassString fibdo
field: attr=Name class=ClassString val=fibdo
field:  DeclFile ClassConstant (0x104401160,0x104656748)
field: attr=DeclFile class=ClassConstant val=1
field:  DeclLine ClassConstant (0x104401160,0x1046567b0)
field: attr=DeclLine class=ClassConstant val=14
field:  Type ClassReference (0x1043ff6e0,0x14000ed2e28)
field: attr=Type class=ClassReference

Anyway. Will try to dig into it a bit more this week.

mathetake commented 1 year ago

On second thought, Option 3: Profiler within Wazero would be the best option for us to take in order to provide the consistent dev experience wit wazero, though not sure if it is feasible or not at all

mathetake commented 1 year ago

correct me if I'm wrong, but it's true that we have to take option 3 in order for Goland's profiling UI to show the guest symbols, right?

Pryz commented 1 year ago

To answer this question I need to dig a bit more into how Go unwind the stack for pprof. With https://github.com/ianlancetaylor/cgosymbolizer and https://pkg.go.dev/runtime#SetCgoTraceback it looks like we can teach Go how to traceback cgo calls. If that is possible for cgo it should be possible for WASM as well, right? Need to explore this more.

If that's not possible, we could implement how own profiler from scratch aka record samples of the guest stack and use the DWARF data or a perfmaps style technique for the symbols.

Pryz commented 1 year ago

I wanted to provide some updates here:

mathetake commented 1 year ago

sounds great!

Pryz commented 1 year ago

I've opened https://github.com/tetratelabs/wazero/pull/1349 for review. Even if we want to implement an internal profiler, I think there are still benefit to provide the perf interface since that is a very common tool on Linux land to profile and debug things. Let me know what you think :)

Pryz commented 1 year ago

Closing the issue. See https://github.com/stealthrocket/wzprof if you need a profiler.