microsoft / Microsoft-Performance-Tools-Linux-Android

Linux, Android and Chromium Performance Tools built using the Microsoft Performance Toolkit. Cross-platform .NET Core + WPA GUI
MIT License
315 stars 33 forks source link

[Feature Request] Add resolved callstacks to Execution Events #118

Open ddeadguyy opened 4 months ago

ddeadguyy commented 4 months ago

I'm hoping for the same New Thread Stack and Ready Thread Stack options that Windows CPU Usage (Precise) has.

ivberg commented 4 months ago

agreed that would be nice

Is this for LTTng or for Perfetto events?

while stackwalk is theoretically possible to obtain on newest tracing, decoding stack and symbols remain a challenge. There is a lack of a nice symbol decoding methods and format like on Windows. This is necessary to decode the stackwalk both in general for Linux/Android as well as in these trace formats.

I think we have experimental barely working support for LTTng for just KM (kernel-mode) stacks if a special file is provided similar to what Trace Compass does.

ddeadguyy commented 4 months ago

This would be for LTTng Execution Events.

Is "experimental barely working support for LTTng for just KM (kernel-mode) stacks" documented anywhere?

We're also evaluating VTune in our search for a WPA-like Linux profiler. It captures Transitions that involve callstacks with our resolved symbols in them. Not quite as comprehensive as Windows CPU Usage (Precise) would be, though.

ivberg commented 4 months ago

We're also evaluating VTune in our search for a WPA-like Linux profiler.

So since you used precise language about CPU Usage Precise and scheduling events, I know you are interested in them. However, just to be clear and if others can make use of this, I did want to say we do support Linux profiling with stacks (KM/UM). This is equivalent to Windows CPU Usage (Sampled).

You used the term "Linux profiler" here probably in the generic sense of the word, not the specific sense of an actual profiler which samples the CPU at a specific interval, determining where CPU time is spent on which functions and the stack that led there (profiling). Last I checked LTTng did not support profiling in this sense. Instead, these tools rely on Linux kernel perf tool cpu-clock events where the stack is decoded by perf/Linux on the box before reading into our tool. All this is documented here - https://github.com/microsoft/Microsoft-Performance-Tools-Linux-Android/blob/develop/LinuxTraceLogCapture.md#perf

Is "experimental barely working support for LTTng for just KM (kernel-mode) stacks" documented anywhere?

With that bit on profiling support out of the way let me move to your specific follow-up question. No it has not been documented before the experimental support we have, but probably you could get it to work with a few bugfixes if you want to look. AFAIK symbol info has not been added to Trace Compass traces, although it could technically be done. Therefore, there has to be some manual way to resolve symbols. LTTng grabs the undecoded callstack, but something still needs to resolve the symbols. I will explain / document here where we are:

  1. Context / Inspiration - Trace Compass (OSS) is a popular GUI for reading LTTng traces, although it has different features than this toolkit and WPA. Trace compass supports providing the kallsyms (KM only) and loading it in the GUI. See https://archive.eclipse.org/tracecompass.incubator/doc/org.eclipse.tracecompass.incubator.kernel.doc.user/User-Guide.html
  2. This is where we attempt to read kallsyms for Perf cpu-clock events converted to LTTng CTF format. The conversion is a bit of a PITA to do - not really recommended. Anyways, the current checked in algo I think is wrong or sucked here, but it gives you the idea of what to do and where to do it.
  3. Anyways, this could be fixed and ported to LTTng scheduling events such that at least the KM stacks could be decoded similar to Trace Compass

Maybe try the kallsyms symbol support in Trace Compass if you can, and see if it works well enough for you to want to use it. Then we would be open to a contribution here in these tools to get similar support

Long term here is what I would suggest for the LTTNg and Linux folks to better support call stacks and symbols

  1. LTTng needs to support profiling (cpu-clock) events
  2. Embed binary id / signature info into the LTTng trace necessary for symbol decode (similar to what Windows does)
  3. Have some sort of symbol server or on-demand method to pull down symbols to decode (similar to what Windows does)
ivberg commented 1 month ago

FYI This (2) is now deprecated - kallsyms for Perf cpu-clock events converted to LTTng CTF format