jaegertracing / jaeger

CNCF Jaeger, a Distributed Tracing Platform
https://www.jaegertracing.io/
Apache License 2.0
20.51k stars 2.44k forks source link

Get error: invalid parent span IDs=30421a83228981a8; skipping clock skew adjustment #2121

Closed JingangLi closed 4 years ago

JingangLi commented 4 years ago

Hi,

I am writing my service by spring boot. Version information:

Jager version 1.17.0 and deployed helm:

helm install jaeger jaegertracing/jaeger -n stab \ --set cassandra.config.max_heap_size=1024M \ --set cassandra.config.heap_new_size=256M \ --set cassandra.resources.requests.memory=2048Mi \ --set cassandra.resources.requests.cpu=0.4 \ --set cassandra.resources.limits.memory=2048Mi \ --set cassandra.resources.limits.cpu=0.4 \ --set query.ingress.enabled=true \ --set query.ingress.hosts[0]=jaeger.fractal.caas.npee.gic.ericsson.se

I face the problem when there are 166 spans in one trace. It isn't too much. invalid parent span IDs=30421a83228981a8; skipping clock skew adjustment

I use the same setup, when there are 46 spans in one trace, the issue is disappeard. I guess the problem is because the child span is sent to collector before the parent span.

I try to update the parameter like --collector.queue-size=9000 --processor.jaeger-binary.server-queue-size=9000 --processor.jaeger-compact.server-queue-size=9000. But it doesn't work.

I also configured the springboot application properties file to extend the queue-size opentracing.jaeger.remote-reporter.max-queue-size=9999. It doesn't work either.

Do you face same issue?

Many thanks. Jingang Li

pavolloffay commented 4 years ago

Try to refresh the trace after some time, the warning message can appear if the parent span hasn't been reported yet.

I use the same setup, when there are 46 spans in one trace, the issue is disappeard. I guess the problem is because the child span is sent to collector before the parent span.

That should not be the problem.

I try to update the parameter like --collector.queue-size=9000 --processor.jaeger-binary.server-queue-size=9000 --processor.jaeger-compact.server-queue-size=9000. But it doesn't work.

These settings do not have any effect on the issue you are facing.

JingangLi commented 4 years ago

Hi @pavolloffay ,

Really thanks for your help. You answers lots of questions from me.

I compare the springboot log output and Jaeger UI's json output of the trace. I found there is a span missed which causes the problem. I put the count of Span's operation name in tables. You can see the operation - collectMbmsStatsuByClusterNodeAsync is missed in Jaeger UI.

The mbmsStatusCollectT invokes Async function collectMbmsStatsuByClusterNode. collectMbmsStatsuByClusterNode invokes other opertions' functions.

image

In my application properties, I have enable spring boot async opentracing.spring.cloud.async.enabled=true

This is the way, I start async function:

new AbstractMap.SimpleEntry<String,Future>( collectType, mbmsInfo.collectMbmsStatsuByClusterNodeAsync( collectType, nodeName ) ) );

And this is my async func

@Async( "promCollectE" ) public CompletableFuture collectMbmsStatsuByClusterNodeAsync( String clusterNm, String nodeNm ) {

I don't know where is the problem. Could you give me some comments?

Many thanks.

Rgds, Jingang Li

JingangLi commented 4 years ago

Try to refresh the trace after some time, the warning message can appear if the parent span hasn't been reported yet.

I use the same setup, when there are 46 spans in one trace, the issue is disappeard. I guess the problem is because the child span is sent to collector before the parent span.

That should not be the problem.

I try to update the parameter like --collector.queue-size=9000 --processor.jaeger-binary.server-queue-size=9000 --processor.jaeger-compact.server-queue-size=9000. But it doesn't work.

These settings do not have any effect on the issue you are facing. Hi @pavolloffay ,

Could you tell me which funciton can be used to refresh the tracer? I try to look into the doc and code of open tracing spring could and jaeger java client, but I can't find the refresh funtion?

Many thanks.

Rgds, Jingang Li

pavolloffay commented 4 years ago

I am not sure what you mean by refresh the tracer? If your application is running a couple of seconds after the transactions all spans should be reported well.

This looks like an instrumentation issue. Please open the issue in the corresponding instrumentation library.

liguangcheng commented 4 years ago

I also encounter this issue,can any body provide me some help

erohini commented 4 years ago

I have the same issue, Can someone help me what needs to be done to resolve the issue?

joe-elliott commented 4 years ago

Generally this error simply means a span did not make it to the backend. Spans could be dropped at numerous locations between your client and the actual backend. These articles may help:

https://medium.com/jaegertracing/where-did-all-my-spans-go-a-guide-to-diagnosing-dropped-spans-in-jaeger-10d9697f8182 https://www.jaegertracing.io/docs/1.20/performance-tuning/

naseemkullah commented 3 years ago

The message should probably distinguish between invalid and missing span

jpkrohling commented 3 years ago

What would be an invalid span?

naseemkullah commented 3 years ago

What would be an invalid span?

I did not think it through tbh.

If you have to ask I guess there are only missing spans and no invalid, in which case missing parent span is probably a better message in all cases.

Though at time of writing I was thinking e.g. a span with any of these invalid props:

"spanID": "👽",
"startTime": NaN,
"duration": "horse",
jpkrohling commented 3 years ago

Spans with invalid data wouldn't be persisted, which would match the "missing'' case. The message could probably be changed to be "missing parent span (xyz)" instead, in my opinion.

naseemkullah commented 3 years ago

Ah ok I think this goes back to my original suggestion (if feasible here): if the span has invalid data, the err message would mention that the span was invalid, whereas if it were simply never sent to jaeger, it would say missing.

bkahlert commented 3 years ago

Try to refresh the trace after some time

....

I am not sure what you mean by refresh the tracer?

@pavolloffay, this is what @JingangLi was referring to.