open-telemetry / opentelemetry-java-instrumentation

OpenTelemetry auto-instrumentation and instrumentation libraries for Java
https://opentelemetry.io
Apache License 2.0
1.92k stars 842 forks source link

OkHTTP3 Instrumentation Doesn't Trace HTTP Client Call End to End #12015

Closed serkan-ozal closed 1 month ago

serkan-ozal commented 1 month ago

Describe the bug

As far as I have tested, OkHTTP3 tracing doesn't measure timing end to end. Because the TracingInterceptor just traces the actual request execution, hence, for example, DNS resolve and SSL handshake timings are not taken into account here.

So, from user/developer perspective who calls the OkHTTP3 client, observed latency might be slightly different than the reported latency by OTEL SDK in some cases (for ex. DNS cache is not reused, and/or new connection is established, SSL handshake is in place).

My question is that is it intentional?

Steps to reproduce

 final String URL = "https://reqres.in/api/users";

 OkHttpClient client = new OkHttpClient.Builder().build();
 Request request = new Request.Builder().url(URL).build();
 Call call = client.newCall(request);

 long start = System.nanoTime();
 call.execute().close();
 long end = System.nanoTime();
 System.out.println("Completed in " + (end - start) + " ns");

On my local, in the first run (DNS cache is not reused and secure connection established), it prints

Completed in 428868251 ns (428 ms)

And the HTTP span duration is reported as 69683200 ns (69 ms).

As you can see, there is a big difference between the actual delay of the HTTP client call and the reported HTTP span duration.

Expected behavior

Observed HTTP client call delay and reported HTTP span durations should be closer.

Actual behavior

There is a big difference between the actual delay of the HTTP client call and the reported HTTP span duration.

Javaagent or library instrumentation version

2.7.0-SNAPSHOT

Environment

JDK: OS:

Additional context

A proper solution might be implemented over EventListener API.

No response

laurit commented 1 month ago

My question is that is it intentional?

It is not. Often instrumentations use apis provided by the underlying framework to do the tracing, what is traced depends on these apis. For example we have http client instrumentations that end the span when response headers are received (okhttp uses this) and others that end the span when the response has been read completely. Have a look at https://opentelemetry.io/docs/specs/semconv/http/http-spans/#http-client Spec proposes two ways for tracing http clients. Firstly the preferred way is to create a span for each request that goes over the wire. This will create separate spans for each retry or redirect. We call this low level instrumentation and this is what is implemented for okhttp. Secondly there is what we call the high level instrumentation that create only a single span for the top most operation. This is what we used to implement for okhttp. I guess we could try to fix this by creating the the span earlier, like we used to when we had the high level instrumentation for okhttp, for the top most operation and for subsequent operations create the spans where we currently do.

serkan-ozal commented 1 month ago

@laurit I think, as far as I understood what you are saying, we are agree on to have a high level span to represent the top most operation. And I believe to achieve this, we could use EventListener API. And even, by this API, we can produce network timing spans like DNS resolve, connect, SSL handshake, actual request under the top most operation span.

If that sounds reasonable, I can contribute in this way.

laurit commented 1 month ago

And even, by this API, we can produce network timing spans like DNS resolve, connect, SSL handshake

These spans are not define in the semantic conventions. You will either have to work towards getting them added to the semconv or alternatively only emit them based on a flag, this is how it is done in netty instrumentation that can optionally emit similar spans.

under the top most operation span.

OkHttp instrumentation is a low level instrumentation as described above, it does not have a top level span. Requests that go over the wire produce sibling spans.

If that sounds reasonable, I can contribute in this way.

You certainly are welcome to give it a try.

serkan-ozal commented 1 month ago

@laurit

These spans are not define in the semantic conventions. You will either have to work towards getting them added to the semconv or alternatively only emit them based on a flag, this is how it is done in netty instrumentation that can optionally emit similar spans.

This is not part of my initial plan, but future plan after we resolve the original problem here. I am aware of how they are handled in Netty instrumentation. And they might not to be span, but span events.

OkHttp instrumentation is a low level instrumentation as described above, it does not have a top level span. Requests that go over the wire produce sibling spans.

Again, I already know OkHttp instrumentation currently doesn't have top level span. That is the reason I have opened this issue. Having top level span for OkHTTP instrumentation is what I am planning to work.

laurit commented 1 month ago

Again, I already know OkHttp instrumentation currently doesn't have top level span. That is the reason I have opened this issue. Having top level span for OkHTTP instrumentation is what I am planning to work.

Current instrumentation implements what is recommended in the specification. Keep in ming that changing the implementation so that it does not follow what is recommended in the spec is unlikely to get accepted.

serkan-ozal commented 1 month ago

Current instrumentation implements what is recommended in the specification. Keep in ming that changing the implementation so that it does not follow what is recommended in the spec is unlikely to get accepted.

My intention is not to come up with a solution specific to my use case (if so, I would contribute to our own fork). Didn't check whether current implementation implements the specification properly, or not but what I am trying to do is enlightening the big gap in some HTTP client traces.