opentracing / opentracing-java

OpenTracing API for Java. 🛑 This library is DEPRECATED! https://github.com/opentracing/specification/issues/163
http://opentracing.io
Apache License 2.0
1.68k stars 344 forks source link

Correct way how to not report fast spans #363

Open rugpanov opened 4 years ago

rugpanov commented 4 years ago

I would like to trace slow operations only. There is a manually chosen threshold to determine whether operation slow or not. I haven't found any description about the correct way to implement it in your documentation, but what came to my mind is:

    final long startTime = System.nanoTime();
    Scope scope = tracer.buildSpan(operationName).startActive(true);
    try {
      //do some work
    } finally {
      long durationInMillis = (System.nanoTime() - startTime) / 1000000;
      if (durationInMillis < THE_THRESHOLD_IN_MILLS) {
        scope = null;
      } else {
        scope.close();
      }
    }

Is it the correct way to drop the spans? If not, what is the correct one?

sjoerdtalsma commented 4 years ago

I don't know how to do that (I hope someone else can answer how to not report fast spans).

The scope should always be closed however, otherwise you risk having wrong parent spans if the thread happened to be reused from a threadpool.

Closing the scope and finishing the span are separate concerns. It is generally a bad idea to mix them. Probably what you want is to close the scope but not finish the span (don't know if this will accomplish your goal though) like so:

final long startTime = System.nanoTime();
final Span span = tracer.buildSpan(operationName).start();
try (Scope scope = tracer.scopeManager().activate(span)) {
    // ...
} finally {
    long durationInMillis = (System.nanoTime() - startTime) / 1000000;
    if (durationInMillis >= THE_THRESHOLD_IN_MILLIS) {
        span.finish();
    } else {
        // I doubt whether this prevents a report of the span ...
    }
}
rugpanov commented 4 years ago

My issue is still unresolved.

@sjoerdtalsma 's solution has two problems/questions to discuss:

  1. I don't know for sure what's happening with an unfinished span - does it stack in my sampler / any other memory leaks?
  2. If the parent span was not finished but children were finished, I will see them in the UI as <trace-without-root-span>
tylerbenson commented 4 years ago

I don't think most tracing systems will allow this since there is no way to know if a distributed request was made. My suggestion would be to look for a way to discard the span after it's finished.

whiskeysierra commented 4 years ago

The semantic conventions define sampling.priority as

If greater than 0, a hint to the Tracer to do its best to capture the trace. If 0, a hint to the trace to not-capture the trace. If absent, the Tracer should use its default sampling mechanism.

If I read this correctly you could add sampling.priority: 0 as a tag, if it's too short/fast. It does sound like it applies to the whole trace, not sure if that would be an issue in your case.

I also have no clue which implementations actually make use of that.

yurishkuro commented 4 years ago

Jaeger clients will respect sampling.priority as described. The downside, as you mentioned, is that you can only make local decision, which can only apply to either future, or at best not-yet-finished (cf. https://github.com/jaegertracing/jaeger/issues/1861), spans within the same process. That means the fast request may have still been sampled downstream (although they could use similar logic across the stack, which would help).

rugpanov commented 4 years ago

As a workaround, I am collecting all the operations under the trace by myself. After the operation is finished, I have all its children with hierarchy and their durations and start times. With all these data I am making a decision whether I should report the trace or not.

My solution requires O(n) additional space and O(n) complexity to collect data and parse it in case we're reporting the trace, but it gives me much more flexibility.

It still looks like there should be a more convenient solution to filter some traces when they're finishing to avoid redundant traffic and db memory consuming. Probably it can be my feature request.

@yurishkuro , my request duplicates https://github.com/jaegertracing/jaeger/issuesd/1861 , right?

whiskeysierra commented 4 years ago

I am collecting all the operations under the trace by myself.

Where do you do that?

rugpanov commented 4 years ago

To clarify: I do not use the tracing library API until all the data is collected and the decision is made. I am collecting the data in my code. I made several interfaces to represent the trace and its operations, and when the decision is made, I use the data to transform it into calls to the tracing library.

whiskeysierra commented 4 years ago

How do you deal with the distributed nature of tracing? I mean your trace might span multiple services before it ends.

rugpanov commented 4 years ago

Currently, I do not have the problem - the infrastructure of my application is made in such a way, that all long operations are reported(not in real-time, but it doesn't matter for my case) in one place so I can catch them there and do not deal with solving distributed services problem.

slto commented 2 years ago

Is the proposed way to handle this via sampling.priority tag?

My use case is we have a library that wraps around JDBC calls to database. We are creating a span around the JDBC call. But when system is running normally these calls are fast enough and we don't need to see these spans in the trace. We want to see if we can enhance the code so it creates a span only if the time is above a threshold. As far as I can tell there is no API to abort/cancel a span. Setting sampling.priority would affect all subsequent spans and in this use case I just want this span not to be emitted. I am not sure if it's the right thing to do to constantly flip sampling.priority.

aiguofer commented 1 year ago

Well.. I've been googling for a while now trying to figure this out. We have a similar use-case with JDBC. In our case, we want to write a Proxy based wrapper for JDBC that records spans for any JDBC calls that make external requests. Since the interface API does not specify which methods fetch data from the underlying data source vs some form of cached state, it's very hard to do this manually. I was hoping to start a trace for all method calls but drop any traces that took less than 50 ms.