Define span modelling conventions for HTTP client spans

jkwatson commented 3 years ago

For consistency across languages and HTTP client libraries, we should provide suggested conventions for modeling details of HTTP client calls. Here is a short list of items to include in the modeling effort:

How do we represent HTTP retries?
How do we represent HTTP redirects?
How do we represent DNS queries that take place during the HTTP client call?
How do we represent TLS handshakes?
If more than one instrumentation library considers themselves the logical HTTP client, how should that be handled?

tedsuo commented 3 years ago

@jkwatson the simplest (and cheapest) model would be a single span for a logical HTTP request with all of the data listed above added as events. Would that model work, or do you have examples of how would be painful?

jkwatson commented 3 years ago

@jkwatson the simplest (and cheapest) model would be a single span for a logical HTTP request with all of the data listed above added as events. Would that model work, or do you have examples of how would be painful?

I have no opinions on the matter...I just want to make sure we capture the recommendations of the experts.

The last question is a stickier one, though, that wouldn't be captured via events, I don't think.

tedsuo commented 3 years ago

If more than one instrumentation library considers themselves the logical HTTP client, how should that be handled?

Ideally, the higher level client should presume that instrumentation for the lower level client is present, and simply decorate the span that the lower level client produces with additional information. If no lower level instrumentation is available, the author of the higher-level instrumentation should create a separate instrumentation package for the lower lever client, rather than duplicate the work.

For situations where there is a categorical difference between the two client levels (DB client span vs HTTP client span) I'm less certain. There will definitely be a nested client spans when there is not a 1:1 relationship between DB and HTTP semantics. For example, some DB clients may make more than one logical HTTP request per DB transaction. In that case there would be a client span for the DB transaction, containing multiple HTTP client spans.

For databases/queues/etc where there is always 1:1 relationship between a DB transaction and an HTTP request; I'm tempted to say that the DB transaction should decorate the HTTP span rather than wrap it. Operation name would be the only potential conflict of interest in that case.

jkwatson commented 3 years ago

If more than one instrumentation library considers themselves the logical HTTP client, how should that be handled?

Ideally, the higher level client should presume that instrumentation for the lower level client is present, and simply decorate the span that the lower level client produces with additional information. If no lower level instrumentation is available, the author of the higher-level instrumentation should create a separate instrumentation package for the lower lever client, rather than duplicate the work.

This assumes a lot of coordination between library authors to make this possible. I don't think we should assume that coordination will exist or even be possible when we write down these modeling decisions. Also, I'm not sure how a higher level instrumentation could decorate spans generated by something that it calls into. How would that work?

tedsuo commented 3 years ago

Good points. The model I was describing works better on the server side; on the client side it is difficult. It would be good to see some examples.

lmolkova commented 3 years ago

How do we represent HTTP retries?

I agree events would make more sence, there are feasibility concerns. The HTTP instrumentation layer (Java byte-code instumentation, .NET native tracing support, python/node.js monkey patching) usually do'not know about retries which are handled on the higher level by user app/library.
Coordination between higher-level and auto-instumentation is possible (parent context may contain hints to ask lower level to emit events instead of spans), but then caller should create HTTP span and it sounds like error prone approach with possible inconsistencies as every lib/user would need to properly attribute http spans.

If caller creates an Internal span for logical call, all 'physical' HTTP spans could all be nicely grouped under it and there is no need in coordination - this is how we've done it in Azure SDKs. It might be too verbose (2 spans even if there is 1 try) which potentially increases costs. We've also entertained the idea of verbosity configuration, depending on user feedback.

How do we represent HTTP redirects?

In practice, it's not always possible to track redirects since they could be handled on the lower level than instrumentation (depends on user-provided configuration). Ideally, the instrumentation should be consistent with http client configuration followRedirects / allowAutoRedirect /etc. Same logic as for retries applies here - auto-instrumentation is not capable of knowing it's a retry and will create a new span.

How do we represent DNS queries that take place during the HTTP client call? How do we represent TLS handshakes?

I don't have an opinion, except it's rarely feasible, so it has to be optional.

If more than one instrumentation library considers themselves the logical HTTP client, how should that be handled?

The coordination may happen through something like terminal span/context - old proposal, ignoring nested client spans or backing-off if traceparent is set (as a last resort).

Oberon00 commented 3 years ago

If caller creates an Internal span for logical call, all 'physical' HTTP spans could all be nicely grouped under it and there is no need in coordination - this is how we've done it in Azure SDKs.

Using INTERNAL here seems bad to me. The logical call is still logically a client. In fact, If you have client spans that are direct children of each other, it seems easy to implement a backend-side rule to collapse them together somehow, whereas it is impossible to know generically that the client spans belong to the internal parent.

lmolkova commented 3 years ago

In fact, If you have client spans that are direct children of each other, it seems easy to implement a backend-side rule to collapse them together somehow, whereas it is impossible to know generically that the client spans belong to the internal parent.

We've made this distinction exaclty for the backend and users: it should be possible to separate 'real' rpc calls from logical ones. It should help users understand when their client library makes external calls vs internal stuff. Imagine the same call may be internal if something is cached locally and external if not (my point is, there is no way to enforce your logic here).

Suppressing anything on the backed would require buffering (you never know what finishes first) and then you can suppress based on some rules. It does not sound like a good generic solution to the problem (which seems to be double instrumentation? or verbosity?) Let's find generic solutions to these problems.

Oberon00 commented 3 years ago

it should be possible to separate 'real' rpc calls from logical ones.

If using the rule "logical plus potentially real if the root client, otherwise physical" is not enough, I would prefer introducing a span attribute like "is_logical" instead of marking the logical client as internal.

svrnm commented 3 years ago

How do we represent HTTP retries? How do we represent HTTP redirects? How do we represent DNS queries that take place during the HTTP client call? How do we represent TLS handshakes?

There is a working example with the web instrumentation in javascript. It leverages the semantics of the W3C Navigation Timing, so maybe this could be a starting point?

Two more remarks:

Some of the questions here are not exclusive to HTTP (dns queries, TLS handshakes) and could also be part of any other network connection, so it might make sense to look at them independently?
A while back I looked into TLS instrumentation for nodejs and also opened the following issue, maybe it helps with the discussion/adds another dimension: https://github.com/open-telemetry/opentelemetry-specification/issues/1652

lmolkova commented 3 years ago

a few more things the spec needs to clarify:

attributes that may affect head-based sampling should be supplied to the sampler
async response stream reading - this is not possible for auto-instrumentation as http client API call may finish before stream reading starts/ends, but worth clarifying in the spec.

Oberon00 commented 3 years ago

attributes that may affect head-based sampling should be supplied to the sampler

There is still an currently unused attribute for that in the semantic convention generator: https://github.com/open-telemetry/build-tools/blob/c7067fa1f0a5c61756dcb93cbc56af566992ab4f/semantic-conventions/src/opentelemetry/semconv/model/semantic_attribute.py#L115 (https://github.com/open-telemetry/opentelemetry-specification/pull/571#discussion_r412260671)

lmolkova commented 3 years ago

Here are my thoughts on HTTP instrumentation approach, hope we can use some of them as a starting point.

Oberon00 commented 3 years ago

each try or redirect is an HTTP span, it has a unique context that must be propagated downstream - HTTP clients don’t control how their APIs are used and spans created need to have the same meaning regardless of use-case. Unambiguously mapping client spans to service span also requires each try/redirect to have/propagate individual context.

I think this may not always be the case. For example, AFAIK in Python if you only instrument the requests library you will get one span per logical HTTP request (as coded in the app). Redirects and retries are handled at a lower level, so you will get only the original URL as http.url (even though the final URL would in theory be available to the instrumentation).

But the User Experience graphics (one client span mysteriously triggers multiple server spans) actually sounds like a good argument that this is really suboptimal. @open-telemetry/python-approvers PTAL, that blog post is quite insightful IMHO.

DNS, TSL, TCP, and any lower levels can be instrumented, but should not be enabled by default (due to low demand, costs, and perf implications)

👍

current attributes have proven themselves over years and are great!

I'm not so sure about the definition of http.flavor and while in theory I quite like http.target, it seems often a bit awkward to implement in practice if you try to do it exactly as defined. But these are OTel-original attributes, compared to e.g. the classic http.method, http.url and http.status_code.

Having a unique client span per physical HTTP request is essential to get support from downstream services and investigate issues.

I think you convinced me. Having this 1:1 relationship between client and server spans is really preferable. I think we want to write that in the HTTP semantic conventions. That being said, there should still be a recommendation for a single span per "logical client request" that covers multiple redirects and retries, but that logical span SHOULD NOT be injected into HTTP headers, only the "physical" spans should. I think that span is especially valuable because you know the original client-requested URL (relevant in case of redirects) and the total duration from the app's perspective.

yurishkuro commented 3 years ago

Having this 1:1 relationship between client and server spans is really preferable.

+1 to have outermost client span per message exchange and different spans for redirects/retries (whenever possible).

there should still be a recommendation for a single span per "logical client request" that covers multiple redirects and retries

The challenge with this is that it could result in a min of 2 client spans per RPC even for the majority of requests that do not do redirects and retries, which seems wasteful.

lmolkova commented 3 years ago

Thanks for the feedback!

I'm not so sure about the definition of http.flavor and while in theory I quite like http.target, it seems often a bit awkward to implement in practice if you try to do it exactly as defined. But these are OTel-original attributes, compared to e.g. the classic http.method, http.url and http.status_code.

yeah, I agree that some attributes are controversial. What I don't fully understand is 'optional' attributes. Do we understand why would instrumentation set them? As a library owner, I want to stay lean and efficient and would avoid optional things (and redundancy). As a user, I would have inconsistent experience (my python app sets one set of attributes and golang app sets different ones, I get broken experience. My metrics are also off. So I think we should avoid any variations for HTTP and have a set of required attributes. Exporters have to deal with breaking up URL into pieces if backend needs it.

lmolkova commented 3 years ago

recommendation for a single span per "logical client request" that covers multiple redirects and retries

Beyond redundancy, as @yurishkuro mentioned (~99% requests don't have retries), it's also misleading. We never know if this logical client span captured retries because on the auto or native instrumentation layer, we don't know how user called into HTTP client. So I'd rather leave it for 1) users who want to have it 2) client libraries that run on top of HTTP.

And I see a huge benefit in knowing about retries, and maybe we'll find ways to approach it, but it seems controversial now. We don't have to agree on this today, but we can make progress on RPC call level regardless of logical ones.

Oberon00 commented 3 years ago

So I think we should avoid any variations for HTTP and have a set of required attributes. Exporters have to deal with breaking up URL into pieces if backend needs it.

That's a different philosophy than what was used when designing these. The problem is that on the client you typically only have the full URL and would need to parse it, while on the server side you always get a "split" URL, because that's how HTTP works. So the idea was that every instrumentation can report directly what it has available without doing data processing on the agent side. The backend (or an in-between processor like the collector) is responsible for normalizing the data. Exactly because data-processing on the agent can be expensive (and lossy!), and a lean agent is more important than a lean backend because the agent runs in more processes.

Oberon00 commented 3 years ago

we don't know how user called into HTTP client

Why? I'm sure there are cases where we would be wrong, but in general instrumenting the most high-level API will work very well. For example, in Java if you use HttpUrlConnection for example, it will follow redirects for you by default. So it would be easy to create a single span covering the "logical" HTTP call, probably easier than creating one span per redirect even (because for the former you can just instrument a public API, while the latter needs to instrument internals -- but that's just my assumption I haven't written an instrumentation for that).

2) client libraries that run on top of HTTP.

That's many HTTP libraries, for example Python requests runs on top of Python urllib3 which runs on top of Python's http.client standard library module.

Oberon00 commented 3 years ago

Pinging @open-telemetry/python-approvers (and @mariojonke): The suggestion (to which I agree) that the child-most client spans are the most important ones contradicts the current implementation in Python where inner (urllib3) spans are suppressed if there is an outer (requests) span. Even with disabling this suppression (or instrumenting just urllib3), I'm not sure if a span would be created per request/redirect. Please take a look at this discussion / join it if you have an opinion here (if not, be prepared to change the HTTP client instrumentation 😄).

owais commented 3 years ago

@Oberon00 I haven't gone through the entire discussion but I also agree that the innermost span is the most important one and should be recorded (and labelled as CLIENT). The API doesn't allow it but when I first thought about this problem, I wanted to flip what Python instrumentation do today and instead of suppressing child spans, I wanted every instrumentation that generates the CLIENT spans to set its parent span's kind to INTERNAL. For example, if both requests and urllib were instrumented, urllib instrumentation would fetch the current span (generated by requests instrumentation) and set its kind to INTERNAL before generating it's own CLIENT span. If every client instrumentation did this by convention then we'd always have a single CLIENT span for one outgoing call. This is probably not doable though because of streaming exporter implementations among other possible limitations.

Oberon00 commented 3 years ago

I think having nested client spans that are marked as client is preferable to having nested client spans where only the innermost one is marked as client. If we want to get rid of additional outer spans in Python we could e.g. just drop the requests instrumentation because the inner spans are covered by urllib3.

lmolkova commented 3 years ago

Why? I'm sure there are cases where we would be wrong, but in general instrumenting the most high-level API will work very well.

Here's java example https://gist.github.com/lmolkova/f34d2dfcb23522b2dc115462f2f76d20 - you can use high-level API however you want - there are no rules about it.

You can do retries with something on higher-level (e.g. with project reactor or rxjava. We in Azure SDK, for example, have abstraction over httpclient to make it pluggable and always do retries with high-level API.

I, unfortunately, have 0 data on how people usually do retries and can only assume things. I assume it's a wild west and high-level logical span is misleading for at least a significant number of users.

lmolkova commented 3 years ago

I think having nested client spans that are marked as client is preferable to having nested client spans where only the innermost one is marked as client.

This worth another discussion. I choose INTERNAL span for logical call because of topology map visualizations. They are presumably interested in client-level calls to show what service calls into. If there are multiple nested spans, they have no heuristics to tell how many outgoing connections to show. We need heuristics to tell them which client span represents the best destination (if they don't want to show all of them). It could be internal span or something else but in the spec.

I recall @anuraaga had somewhat similar concerns for AWS SDK.

E.g. 2 client spans - two destination

internal + client span: 1 destination (in Azure Monitor we decided that's preferrable for our users)

Oberon00 commented 3 years ago

I, unfortunately, have 0 data on how people usually do retries and can only assume things. I assume it's a wild west and high-level logical span is misleading for at least a significant number of users.

At least for Python, I assume the opposite, as retries are baked into most libraries. But even if that is not the case, at least we would add the information that some spans belong together, even if we have a disconnect at an even higher level.

That information cannot be recovered on the backend. I would be open to use links instead of a parent span though: The high-level instrumentation sets some special context field like "retry_of"/"redirect_of" to the SpanContext of the previous try/request, and the sensor picks it up.

EDIT: Of course there is also the practical argument that some users would complain if we change this now in Python, so you need to persuade at least the Python maintainers 😃

Oberon00 commented 3 years ago

We need heuristics to tell them which client span represents the best destination

You could use resources to determine if these spans come from the same entity. We could also have a flag on the protocol to say if the parent was remote or not (the exporter knows this). If that is not enough/possible, we could discuss having a new span type "LOGICAL_CLIENT" for spans that expect to have more LOGICAL_CLIENT or CLIENT spans locally under them.

But I think an hasRemoteParent on the protocol is what you want for your heuristic.

lmolkova commented 3 years ago

At least for Python, I assume the opposite, as retries are baked into most libraries.

Retry policies are usually app-specific - it's hard to tell what needs to be retried and how in general case. I assume users either give retry policies to Python clients or call into high-level api multiple times. Is there a good read/doc/example on this?

lmolkova commented 3 years ago

Briefly searched: python requests indeed have a way to pass retry policy - https://findwork.dev/blog/advanced-usage-python-requests-timeouts-retries-hooks/

But google tells it's not the only way https://julien.danjou.info/python-retrying/ and look at the comment here https://www.peterbe.com/plog/best-practice-with-retries-with-requests

Your first ( ostensibly "horrible") solution works the best for me, the rest is too verbose.

I'm not saying there are a lot of users who do a simple loop over high-level API, but they're out there and would come to ask questions and request support from OTel maintainers and vendors. High-level API call does not seem to bring a huge value to ~99% of requests, increases customer bill significantly along with potential costs of supporting this solution.

Oberon00 commented 3 years ago

I'm not saying there are a lot of users who do a simple loop over high-level API, but they're out there and would come to ask questions and request support from OTel maintainers and vendors.

They will certainly complain if the current behavior changes, which is that only the first (most logical) client span is instrumented.

So far I haven't seen any complaints about the current behavior on the other hand, but I'm not monitoring otel-python very closely, so @open-telemetry/python-approvers might know if some users where already suprised by the current behavior and expected the "bottom-most client" one suggested here (to which I agree that it is in theory superior).

iNikem commented 3 years ago

I would like again to remind about one specific use-case that any proposal has to support: application developer manually wrapping high-level http call into CLIENT span with HTTP semantic convention AND, possibly, manually injecting the SpanContext of that span into outgoing request.

I think whatever auto-instrumentations do they at least have to cope with this scenario and don't offer too much of a surprise to the app developer. E.g. always clearing out injected context and force-injecting lower-level context may be an unpleasant surprise for them.

austinlparker commented 6 months ago

This work has been completed as part of HTTP semantic convention stabilization.

open-telemetry / opentelemetry-specification

Define span modelling conventions for HTTP client spans #1747