protocol / prodeng

Issues, discussions and documentation from the production engineering team
2 stars 1 forks source link

Thunderdome: High Signal Tracing #27

Open iand opened 1 year ago

iand commented 1 year ago

What Is It?

Provide high signal tracing options to better diagnose performance edge cases.

Deliverables

Why Are We Doing It?

Tracing is one of the most valuable tools we have to analyse running software but it can be extremely difficult to find useful traces due to the volume. The usual method of reducing the volume is to randomly sample a fraction of requests but this doesn't improve the signal/noise ratio. We want to boost the signal by selectively recording traces outside of the system under test. The most interesting requests are the slow ones and those that fail. We want to be able to report traces just for these requests, eliminating the noise of short successful ones.