open-telemetry / community

OpenTelemetry community content
https://opentelemetry.io
Apache License 2.0
767 stars 231 forks source link

Time difference when tracing between microservice #1145

Closed anngdinh closed 1 year ago

anngdinh commented 2 years ago

I have some microservices written in Java (Spring Boot) and Go. My architecture to follow is:

OpenTelemetry SDK ---- send directive -----> Jaeger Collector ------> Elasticsearch -----> Jaeger UI.

I have tested in localhost and everything works correctly, but when deploying in k8s, some microservices' spans are pushed back but still keep the correct duration. The tracing time between microservices different by about 800 - 1000 microseconds (compare results in localhost). Is there any reason for that problem?

In localhost: image In k8s: image

dmathieu commented 2 years ago

Could that be clock skew?

anngdinh commented 2 years ago

Could that be clock skew?

I check the time in each pod in k8s and the same time zone.

trask commented 1 year ago

hi @dinhan2411! if you are still facing this issue, i'd recommend opening an issue or discussion in either the Java or Go repository

anngdinh commented 1 year ago

Thanks to @dmathieu and @trask for the help! I already know what the problem is. That's because in each span, the time is taken from the server, and when going through the microservices, the servers' time can be several hundred milliseconds apart. So when drawing a trace, the spans will be deflected.

If you are using jeager, you can use the preconfigured skew clock so that the jaeger UI can automatically align the spans to the correct position (relatively, based on the span hierarchy, it will center the child spans to the parent span). It's not the best way to measure it. Is there a way to add a microservice time offset parameter that the user enters to process? (because the skew is fixed)

trask commented 1 year ago

hey @anngdinh

Is there a way to add a microservice time offset parameter that the user enters to process?

you could try asking this in the Jaeger community

also, you may find this discussion interesting: https://github.com/open-telemetry/oteps/issues/154