tokio-rs / tracing

Application level tracing for Rust.
https://tracing.rs
MIT License
5.37k stars 701 forks source link

tracing: Provide a GUI/Console-Based Visualization of Spans and Events #884

Open davidbarsky opened 4 years ago

davidbarsky commented 4 years ago

Feature Request

Motivation

A decently common feature request that comes up in conversations is for tracing to provide some sort of visualization of spans and events in some sort of GUI. This would help tracing users better understand and debug their applications.

Proposal

At a high level, I can think of a few options:

Alternatives

Don't do this.

hawkw commented 4 years ago

I'm strongly in favour of more options for visualization. I think that broadly, we have two categories of options:

  1. Write our own thing. This would take a bunch of work, but it has the advantage that a new tool could be designed specifically for the types of data that tracing outputs.
  2. Implement integrations with existing tools. This is probably much less work, and has the advantage that we don't have to implement any of the UI parts. That, in particular, might be good, since I personally have little to no experience with any kind of UI programming, whether it's GUI or console-based. However, the disadvantage of this is that no other system's data model will line up exactly with tracing's. Potentially, some fidelity is lost when the other system can't represent something that tracing emits, or when tracing doesn't record something the other system expects. Also, there are sometimes semantic differences: if we implement an integration with a system that's based on callstack sampling, it would make sense to represent spans to the other system as "stack frames". However, a tracing span is not exactly a stack frame — it might span multiple stack frames, be entered multiple times in different callstacks, et cetera, and represents a higher-level notion of context in the actual application logic, rather than being a detail of what actual functions are called.

I think we definitely want to provide integrations with as many other existing tools as possible. There are already several — tracing-coz, tracing-tracy, and tracing-flame, as well as the OpenTelemetry integration all come to mind, as well as tracing-wasm's support for browser perf analysis tools. However, one thing about these other tools is that they tend to be tuned for a particular use case. For example, tracy is intended for gamedev, and it has a first-class concept of "frames" (as in video, not as in stack frames), so it may not be suitable for debugging a microservice. In contrast, OpenTelemetry is a distributed tracing tool that makes request-response RPC models first class...which you probably don't want if you want to debug a game. And obviously, the browser performance profiling tools only make sense when you're running in the browser.

So, while integrations with other tools are very valuable, they don't really give us a solution that we can suggest to anyone who wants to be able to visualize trace data. For example, OpenTelemetry is a great option if you are implementing a microservice in a distributed application that already has all the infrastructure to use this data set up — if you're already running a Jaeger collector or something — but it's not something it would make sense to suggest to the rustc maintainers. This suggests that we might want to think about doing our own thing that's specific to tracing, in addition to supporting people who are implementing integrations.

I think that perhaps the best use-case to target with such a tool is interactive debugging. There are already several integrations that are more intended for performance profling use cases, like tracing-flame; these tend to generate a static representation that records a single run of the software (e.g. a flamegraph SVG). We might want to think about a tool for interactive, on-line debugging of a running system, and/or for interactively exploring a saved capture from a running system. In particular, I might want to prioritize a syntax for interactively filtering/querying the data, and controlling how it is formatted.

In re: your specific suggestions:

  • Use pprof. This might be kinda easy!

I think pprof is quite nice, and we should definitely have a layer for outputting the pprof format. However, it's very strongly geared towards profiling in particular, and its data model seems to emphasize performance data. I'm not sure if pprof output would be the ideal solution for an interactive debugging tool.

najamelan commented 4 years ago

I have started working on a simple web gui that can separate out the log from different async components side by side. This is what I have so far. It's far from finished, needs a lot of polish and features.

What you see is a log file from a program which has 3 components. A server and a client that send messages back and forth through a relay. It uses the tracing instrument method to give instrumented executors to each of these tasks so their log statements are annotated.

The first image shows the filter boxes with the names of the instrumented executors filled in. The second image shows a scroll down, where you can see how the program flow goes through the three components.

tracing-prism1 tracing-prism2

Some next steps I am planning:

najamelan commented 4 years ago

This is still a work in progress, but I'm traveling. Next week I will polish some more. What's missing right now:

It does have collapsible columns, detects log levels and colors them as well as letting you filter on them.

So I quickly published it on github pages so others can have a look at it: https://najamelan.github.io/tracing_prism/ You can find the code for now on my github profile.

It should be great if you have a wide screen, but I'm on a laptop right now so i haven't tested that yet myself. Give it a try. If you want to run it offline, you can download the repository and checkout the gh-pages branch. Be sure to configure firefox to allow loading scripts on a "file://" link.

If you want to compile it yourself, change the dependencies of the thespis crates to git links pointing to the dev branches.

Screenshot from 2020-09-02 15-54-29

denismaxim0v commented 3 years ago

Should this still be open? I'd love to help working on this, seems like an interesting issue

najamelan commented 3 years ago

This reminded me that I forgot to update the post above. tracing_prism now does support JSON. I'm pretty happy with it personally. I looked into a more streamlined workflow than loading the log in the page with the browse button, but I felt that in the end it wasn't worth it, so I have put that on halt unless there is demand.

Of course people might prefer other ways of visualizing the logs, so I suppose @hawkw will get back to you explaining why she added the "help wanted" tag...

patrickelectric commented 10 months ago

It would be nice to have support to one of the formats that profiler.firefox support, like: https://docs.google.com/document/d/1CvAClvFfyA5R-PhYUmn5OOQtYMH4h6I0nSsKchNAySU/preview#heading=h.yr4qxyxotyw