sample entire trace which contains span with tag "error=true"

seanyinx commented 4 years ago

Please first, look at existing issues to see if the feature has been requested before. If you don't find anything tell us what problem you’re trying to solve. Often a solution already exists! Don’t send pull requests to implement new features without first getting our support. Sometimes we leave features out on purpose to keep the project small.

Feature: To provide a way to sample entire trace which contains span with tag "error=true" or significant delay

Rational Currently sampling is based on traceId. Normally we care much more about failed or slow requests instead of successful ones. With sampling on traceId, it's possible to miss important traces with error information; while it puts lots of pressure on underlying storage if we don't sample at all.

Example Scenario

if a trace contains no error or slow span, sample the trace before saving with a property xxx.ok-requests.sample=0.1
if a trace contains error or slow span, sample the trace before saving with a property xxx.ex-requests.sample=1.0

Prior Art

Links to prior art
More links

codefromthecrypt commented 4 years ago

as this implies knowing when a trace is complete (stateful buffering on traceId), I'd recommend moving this to something that could implement it, such as https://github.com/openzipkin-contrib/zipkin-storage-kafka

codefromthecrypt commented 4 years ago

ps it isn't semantically meaningful in zipkin "error=true". presence of "error" at all is. "error=true" is a thing opentracing did, which introduced bugs like sending "error=false"

seanyinx commented 4 years ago

as this implies knowing when a trace is complete (stateful buffering on traceId), I'd recommend moving this to something that could implement it, such as https://github.com/openzipkin-contrib/zipkin-storage-kafka

Thank you, Adrian. I'll take a look.

seanyinx commented 4 years ago

@adriancole https://github.com/openzipkin-contrib/zipkin-storage-kafka seems not to be actively maintained. Is it a viable approach to add a stateful component that samples an entire trace before sending to zipkin server?

codefromthecrypt commented 4 years ago

we don't support it, but if you need an immediate answer you can try this https://github.com/open-telemetry/opentelemetry-collector

fyi zipkin is a completely volunteer project, so it isn't always the case people can get a new feature analyzed and actioned in 2 days

openzipkin / zipkin-support

sample entire trace which contains span with tag "error=true" #37