open-telemetry / opentelemetry-java-instrumentation

OpenTelemetry auto-instrumentation and instrumentation libraries for Java
https://opentelemetry.io
Apache License 2.0
1.98k stars 865 forks source link

Add attribute identifiers to the root span to make trace analysis simpler #10380

Closed zhangjiabin1010 closed 9 months ago

zhangjiabin1010 commented 9 months ago

Is your feature request related to a problem? Please describe.

We need to perform aggregate analysis on a large number of spans, such as counting the number of traces and the average time taken by the traces. Aggregating a large number of traceids will affect query efficiency. If identifiers can be added to the root span, statistics will be more convenient and queries will be faster.

Describe the solution you'd like

Add a field to the attributes of the root span for identification, such as adding root_span='1' or entry= 1''.

Describe alternatives you've considered

I have customized the java-agent and added the entry = '1' flag. But I'm not familiar with other languages。 If you think this requirement useful, you can add this function to other language SDKs.

Additional context

No response

laurit commented 9 months ago

If you think this requirement useful, you can add this function to other language SDKs.

Such features should be defined in the specification https://github.com/open-telemetry/semantic-conventions I personally find it unlikely that such a feature would be accepted to the specification because even now tracing backends can figure this out based on other info, e.g. span with kind SERVER should be a local root span. We suggest implementing features that are not of interest to wider audience as agent extensions https://github.com/open-telemetry/opentelemetry-java-instrumentation/tree/main/examples/extension

zhangjiabin1010 commented 9 months ago

Judging from the data collected in our production environment, a lot of data cannot be used as the judgment condition for root span based on span kind = 'server'.

For example:

  1. The root span of the scheduled task is function, and the kind is Internal.
  2. The root span is the sql statement, and the kind is the Client.
  3. For heartbeat detection and other similar requests, the root span type is Client.

Do you have any more suggestions? Looking forward to your reply.

laurit commented 9 months ago

From you samples it seems that you are not looking for a local root span (root span within one service) but the root span for the whole trace. Isn't that easy to detect in your backend, the span that does not have a parent span is the root?

zhangjiabin1010 commented 9 months ago

I am looking for a local root span (root span within one service). Because of this, I cannot use no parent id as a judgment condition because there is a situation where a trace contains multiple services.

I'm looking forward to is to identify the root span of each local service.

laurit commented 9 months ago

Local root spans shouldn't be that hard to find in the backend either. Local root span is a span without a parent or a span of type SERVER or CONSUMER. The CONSUMER is a bit tricky as a CONSUMER span with messaging.operation set to process may have a parent span within the same service that is also CONSUMER but with messaging.operation set to receive. I guess for CONSUMER could say that a CONSUMER span is a local root if its parent span is not a CONSUMER span. Anyway you already had a solution for tagging the local root span in the agent. To reiterate the original answer, we are unlikely to implement this unless required by the specification.

trask commented 9 months ago

another option for detecting local root spans on the backend is to check if the parent span "is remote":

https://github.com/open-telemetry/opentelemetry-proto/blob/c451441d7b73f702d1647574c730daf7786f188c/opentelemetry/proto/trace/v1/trace.proto#L348-L352

github-actions[bot] commented 9 months ago

This has been automatically marked as stale because it has been marked as needing author feedback and has not had any activity for 7 days. It will be closed automatically if there is no response from the author within 7 additional days from this comment.

zhangjiabin1010 commented 9 months ago

在后端找到本地根跨度也不应该那么困难。本地根跨度是没有父级的跨度或类型为SERVER或 的跨度CONSUMER。这CONSUMER有点棘手,因为设置为 的CONSUMER跨度可能在同一服务中具有父跨度,但该服务也设置为。我想如果它的父跨度不是跨度,则可以说跨度是本地根。无论如何,您已经有了在代理中标记本地根跨度的解决方案。重申一下原来的答案,除非规范要求,否则我们不太可能实现这一点。messaging.operation``process``CONSUMER``messaging.operation``receive``CONSUMER``CONSUMER``CONSUMER

OK, thanks for the answer

zhangjiabin1010 commented 9 months ago

在检测本地根跨度的另一个选项是检查父跨度是否“远程”:

https://github.com/open-telemetry/opentelemetry-proto/blob/c451441d7b73f702d1647574c730daf7786f188c/opentelemetry/proto/trace/v1/trace.proto#L348-L352

I understand this solution, but it requires code judgment and modification in SDKs in multiple languages, which is somewhat difficult to maintain. It would be nice if there was a more general solution. If the community is not ready to add this specification, I will try to implement it myself. Thanks for your answer