Thunderdome: High Signal Tracing

What Is It?

Provide high signal tracing options to better diagnose performance edge cases.

Deliverables

Experiment option to enable long tail tracing of only the slowest requests
Experiment option to enable tracing only of failed requests
Traces made available in Tempo

Why Are We Doing It?

Tracing is one of the most valuable tools we have to analyse running software but it can be extremely difficult to find useful traces due to the volume. The usual method of reducing the volume is to randomly sample a fraction of requests but this doesn't improve the signal/noise ratio. We want to boost the signal by selectively recording traces outside of the system under test. The most interesting requests are the slow ones and those that fail. We want to be able to report traces just for these requests, eliminating the noise of short successful ones.

protocol / prodeng