Merge follow-up span that carry additional attributes with its parent

pavolloffay commented 3 years ago

HTTP client response instrumentation in Java agent captures the body after the span has been finished. This instrumentation instruments input stream and the body is usually read after the client request is closed. We have chosen to do it this way to minimize interactions with the client app.

The body attribute is added to the newly created and immediately finished span that is a child of the HTTP client span - https://github.com/hypertrace/javaagent/blob/main/instrumentation/apache-httpclient-4.0/src/main/java/io/opentelemetry/instrumentation/hypertrace/apachehttpclient/v4_0/InputStreamUtils.java#L53.

The platform will have to deal with this either by merging the follow-up span with its parent or by dealing with the fact that some attributes are captured in the child span.

EDIT1: another example from JAX-RS client https://github.com/hypertrace/javaagent/pull/159/files#diff-bc3f663e3c15fe35183bc490b01d9680a1a0b28820b99b1473bcf50b6b301ae9R65

mohit-a21 commented 3 years ago

Is this going to be a framework-specific behavior?

TRACER
          .spanBuilder("additional-data")
          .setParent(Context.root().with(span))
          .setAttribute(attributeKey, value)
          .startSpan()
          .end();

Here, the name of the span seems to be additional-data, right? Will that be an identifier for such spans?

jcchavezs commented 3 years ago

I have some reluctance about this. First of all, I'd avoid creating synthetic spans as the logic for merging can be problematic (even if spans are create one after the other, they can be reported in different batches). Second, this looks like a workaround just because of the nature of java instrumentation so I am not sure we should do these kind of specifics because that can end up in lots of custom logic for different agents. Last, I would point instead into the direction of update spans after reporting rather than creating a new span. This is, reporting two versions of the same span (same ID) with complimentary attributes. This sounds to me something that makes more sense not only for this case but also for async stuff.

On Wed, Dec 2, 2020 at 9:29 PM Mohit Agarwal notifications@github.com wrote:

Is this going to be a framework-specific behavior?

TRACER .spanBuilder("additional-data") .setParent(Context.root().with(span)) .setAttribute(attributeKey, value) .startSpan() .end();

Here, the name of the span seems to be additional-data, right? Will that be an identifier for such spans?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/hypertrace/hypertrace-ingester/issues/81#issuecomment-737477039, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAXOYAWZFNBELIYCEZ4GIOTSS2PRNANCNFSM4UK33BIA .

pavolloffay commented 3 years ago

Here, the name of the span seems to be additional-data, right? Will that be an identifier for such spans?

We can agree on different names.

Second, this looks like a workaround just because of the nature of java instrumentation so I am not sure we should do these kind of specifics because that can end up in lots of custom logic for different agents.

A workaround for what? When the response body is read the client span (span created in the HTTP client for the outbound request) is finished. This depends on the HTTP client framework, but there are such frameworks.

would point instead into the direction of update spans after reporting rather than creating a new span. This is, reporting two versions of the same span (same ID) with complimentary attributes. This sounds to me something that makes more sense not only for this case but also for async stuff.

There are a couple of issues with your proposal:

this complicates SDK, none of the SDKs we use supports such functionality.
the platform has to deal with the multiple flushes as well - probably in a similar way as reporting additional span with additional attributes.

mohit-a21 commented 3 years ago

This is, reporting two versions of the same span (same ID) with complimentary attributes.

Is versioning of span id a thing? Does any of the existing tracing platform support that? (not suggesting that if it's not there, we should not do it.)

pavolloffay commented 3 years ago

Is versioning of span id a thing? Does any of the existing tracing platform support that? (not suggesting that if it's not there, we should not do it.)

No, there is no versioning of span id. I think zipkin brave or any other zipkin SDK supported multiple flushes of the same spa. I don't 'know how it is implemented but it might be done in two ways - 1. allow multiple flushes before and span finish and 2. allow flushes at any time. If the 1. applies it means that the span has to be finished once the body is read - this means wrong timing information.

Alternatively I could create a span for reading the body. That would start when the read begins and finishes when the stream is closed. it can provide some valuable timing info, but additional work on the platform will be still required.

jcchavezs commented 3 years ago

Alternatively I could create a span for reading the body. That would start when the read begins and finishes when the stream is closed. it can provide some valuable timing info, but additional work on the platform will be still required.

I think this is a good idea. Can we do that?

No, there is no versioning of span id. I think zipkin brave or any other zipkin SDK supported multiple flushes of the same spa. I don't 'know how it is implemented but it might be done in two ways - 1. allow multiple flushes before and span finish and 2. allow flushes at any time. If the 1. applies it means that the span has to be finished once the body is read - this means wrong timing information.

Zipkin keeps both span object in storages and do the merge on UI for representation. Search does not need to know about this merge as it will just look into the two objects.

jcchavezs commented 3 years ago

I am not sure if this is late yet or not but I have some suggestion to the proposal. I we are determined to do the merge then what if we create such span named it additional_data with attributes _hypertrace_additional_data_for=${spanID} and _hypertrace_additional_data_annotation=${operation} then we know deterministically what is the span we aim for and what is the operation about. Also we can turn the duration of that span as annotation in the target span with ${operation}_start and ${operation}_end.

hypertrace / hypertrace-ingester

Merge follow-up span that carry additional attributes with its parent #81