Open pjfitzgibbons opened 1 year ago
@kavck @kgcreative @derek-ho Putting this on our backlog for discussion and consideration. I have time to collaborate a UX mock of my expected design if that is appropriate.
@pjfitzgibbons That'd be super helpful!
What is the bug? Currently, the Trace Analytics Dashboard, when viewed on a Jaeger datatype, presents the following graph and table :
Note that the radio-buttons "Error Rate" / "Throughput" change both the displayed visualization as well as the table. "Latency" is not included as a visualization
How can one reproduce the bug? Steps to reproduce the behavior:
What is the expected behavior? The aggregations of Error Rate, Latency, and Throughput are actually inter-related, yet somewhat orthogonal measurements of an application operational quality. These three measurements mean different things, yet together can reduce the possible surface-area of an operational issue.
Examples : 1. A spiked Error Rate could be combined with a coordinate spike in Throughput - thumbnail analysis : traffic spike is taxing the system, and causing availability problems. 2. A reduction in Throughput combined with an increase in latency could represent a dependency issue or change in application logic that has affected per-request performance.
These measurements are very often analyzed in concert while monitoring and troubleshooting a system.
Recommendations :
Extend the table to include columns for Error Rate, Throughput, AND Latency. Column sorting can easily present the user with "Top 5" of each measurement. A "Top 5 Worst Endpoints" could be achieved by weighted sorting of the combined three measurements - this would be a sort of heat-map of trouble in the monitored system. Column-sorting should be mirrored in the URL (query string ?) to allow sharing of the display as-configured for discussion of measurements in a specific context.
Display the visualization with breakdown lines for each of Error Rate, Throughput, and Latency. One may be selected "by default" as a lone graph. If so, allow user-configuration of the "default" displayed breakdown. Allow checkboxes on the visualization Legend to show/hide each breakdown line. The reasoning for this functionality is aligned with the background above - these measurements are often analyzed in concert, and the visualization is the quickest way for humans to correlate anomalous or proportional changes in each measurement.
What is your host/environment?
Do you have any screenshots? If applicable, add screenshots to help explain your problem.
Do you have any additional context? Add any other context about the problem.