jaegertracing / jaeger

CNCF Jaeger, a Distributed Tracing Platform
https://www.jaegertracing.io/
Apache License 2.0
20.32k stars 2.42k forks source link

invalid parent span XXXXX; skipping clock skew adjustment #3084

Closed nikitaChernyshev closed 4 months ago

nikitaChernyshev commented 3 years ago

Describe the bug

I use Spring Cloud Sleuth. I propagate headers. My tracing span find parentSpanId, but i can't see full tree.

To Reproduce Steps to reproduce the behavior:

  1. I deploy my app in kubernetes/istio

  2. I open Jaeger dashboard and find all metrics, bull all metics have warning: invalid parent span IDs=71726c048cad372a; skipping clock skew adjustment. I wait 2 days and try to refresh dashboard, but i have same problem.

  3. If i use io.opentracing.contrib and setup env i can see full tree. but i want get full tree trace with Spring cloud sleuth without io.opentracing.contrib

Expected behavior i want to see full trace tree.

Screenshots My ajeger dashboard: akipping clock skew adjustment

Version (please complete the following information):

What troubleshooting steps did you try? i think this is a bug on the jaeger side, or i I can't tune env :( Mabye i should tune a bean or configure jaeger?

my code in GitHub https://github.com/nikita111100/gRPC-vs-REST

Ashmita152 commented 3 years ago

Hi @nikita111100

I think you are hitting this bug: https://github.com/jaegertracing/jaeger/issues/2719 which is resolved in v1.22.0 jaeger release.

nikitaChernyshev commented 3 years ago

Now i change version to 1.22.0, but i have same problem :(

leRobbe commented 3 years ago

I'm also facing this issue..it seems to have to do with parent spans being finished or w/e before execution of follow-up methods the spans get passed to for .childOf reference.

I'm facing this issue at two different parts in my code, one of which where I don't know why its running into this problem since this exact code snippet (span creation) works at a different place just fine, but with where I'm facing the issue at it's always the same span getting dropped, thus child spans "hang in the air" and span dependencies aren't shown correctly.

The 2nd place is where I'm trying to repeat a methodA (at the beginning of which I create a new span as childOf span passed to methodA) to simulate error&retry...tried both with recursively calling methodA again with current span as parameter, and when this didn't work, with creating a new object and calling its methodA. However, both ways it's always the exact same span getting dropped and childs not showing correctly in the reference tree with "warning: invalid parent span..." When I call the methodA only once, its child methodB is referenced correctly in the dependency tree, with no warnings. Here a simple illustration:

correct tree when calling methodA only once:

method1
  - method2
    - methodA
      - methodB (called within methodA)

wrong references (missing parent span) when trying to recursively call methodA (or creating new object and calling its methodA):

method1
  - method2
methodB, 1st invocation (-> seems at this point span of methodA called the first time gets dropped)
methodA, 2nd invocation
  - methodB, 2nd invocation (-> for some reason here the parent is referenced correctly but obviously missing its 1st parent (methodA, 1st (= initial) invocation)

what I want for the wrong scenario is:

method1
  - method2
     - methodA
           - methodB
           - methodA
                  - methodB

In the OT guides here https://opentracing.io/guides/java/spans/ they talk about scope and activate/deactivate spans, but half of the methods mentioned on this page is not available (as in, they don't exist in the OT lib I'm using).

Note: This happens both for all-in-one strategy with in-memory storage as well as production strategy with ES backend.

In terms of code, what I'm doing is:

//this method gets called by method2 (see above)
public String methodA(String someString, Span parent) {
  Span current = GlobalTracer.get().buildSpan("someSpan").asChildOf(parent).start();
  span.setTag("component", "some component");
  //doing something with given string
  String abc = methodB(String someString, Span current);
  if (shallBeCalledAgain) {
      return methodA(someString, current);
  }
  current.finish();
  return abc;
}

private String methodB(String someString, Span parent) {
   Span span = GlobalTracer.get().buildSpan("methodB")
                            .asChildOf(parent)
                            .withTag("component", "same component as methodA")
                            .start();
   //doing something with given string
   return "abcd";
}

As mentioned above, if !shallBeCalledAgain everything works fine, methodA span doesn't get dropped and methodB span references it correctly. But if shallBeCalledAgain, then initial span of methodA gets dropped, 1st invocation of methodB hangs in the air, 2nd invocation of methodA hangs in the air, 2nd invocation of methodB correctly references 2nd methodA.

Exact same thing happens with 2 different code snippets that presumably do the same: When methodB gets a span as parameter and I create a new span with

Span span = GlobalTracer.get().buildSpan("methodB")
                             .asChildOf(parent)
                             .withTag("component", "same component as methodA")
                             .start();

it works perfectly fine. However, using the exact same code snippet (except other method name in buildSpan()) in methodA, same issue as described above. Somehow I was able to "trick" my way around it by instead using

Span span = GlobalTracer.get().buildSpan("methodA").asChildOf(parent).start();
span.setTag("component", "some component");

However, I'm not able to fix this issue for the required multiple (recursive) calling of the same method.

As per #2719 I've also tried refreshing traces after some time (minutes, hours, days), to no avail. The Jaeger version I'm using is 1.24.0, so it seems its not the same issue that was supposedly fixed in v1.22.0

Since I'm fairly new to Jaeger it would be great if you could point out what exactly I'm doing wrong here, since I can't figure out why exactly this very same span is dropped every time.

Sherlock-Holo commented 2 years ago

I got this problem too, but I am using rust with tracing, tracing_opentelemetry, opentelemetry and opentelemetry_jeager crates

MrCroxx commented 2 years ago

Sorry to bother, but I face the same problem, too, with rust crates tracing, tracing_opentelemetry, opentelemetry, and opentelemetry_jaeger. My jaeger version is 1.33.0. I'm not sure if its a bug from jaeger itself or rust crates.

hugocortes commented 2 years ago

I'm running into this issue as well in Typescript using HTTPInstrumentation. I believe this issue is due to Jaeger matching based on trace id, not span id as seen here: https://github.com/jaegertracing/jaeger/blob/7872d1b07439c3f2d316065b1fd53e885b26a66f/model/span.go#L104

In my case, I have 3 spans which should appear as:

> /live
  > HTTP GET
    > /ready

Here is /live info:

{
  "spanID": "82a3dd7705c4527d",
  "traceID": "7ca373f7c7e0f17df9965485593768ce",
...
}

HTTP GET:

{
  "spanID": "46dfdbccdb1ddb89",
  "traceID": "7ca373f7c7e0f17df9965485593768ce",
  "references": {
    "spanID": [
      "82a3dd7705c4527d"
    ],
    "traceID": [
      "7ca373f7c7e0f17df9965485593768ce"
    ],
    "refType": [
      "CHILD_OF"
    ]
  }
...
}

references.spanID and references.traceID equal the parents.

And here's /ready:

{
  "spanID": "657d01f13e7ed900",
  "traceID": "f9965485593768ce",
  "references": {
    "spanID": [
      "46dfdbccdb1ddb89"
    ],
    "traceID": [
      "f9965485593768ce"
    ],
    "refType": [
      "CHILD_OF"
    ]
  }
..
}

In this case, traceID was truncated to 16 characters. If I update doc to be traceID: 7ca373f7c7e0f17df9965485593768ce, then I no longer get invalid parent span XXXX. Error appears to be a bit misleading.

I did open an issue in JS repository: https://github.com/open-telemetry/opentelemetry-js-contrib/issues/1023 as OTTracePropgator is truncating traceID to 16 chars, however this appears to be done across various languages so I'm not quite sure whether this is a propgator bug or a Jaeger bug due to relying on TraceID rather than SpanID.

yurishkuro commented 2 years ago

@hugocortes the data you showed is clearly malformed, likely due to misconfiguration of the SDKs using different id length. Jaeger wouldn't even consider the 3rd span as part of the same trace - how is it supposed to know if the trace ids are different?

goodosoft commented 11 months ago

have same problem,wait and refresh dashboard not work.

jkowall commented 4 months ago

This is an Otel question since there seems to be issues with the instrumentation, as @yurishkuro pointed out. If the data is formatted correctly from instrumentation, this should not occur.