open-telemetry / semantic-conventions

Defines standards for generating consistent, accessible telemetry across a variety of domains
Apache License 2.0
256 stars 165 forks source link

network semconv change: local/remote side no longer attributable #62

Open Oberon00 opened 1 year ago

Oberon00 commented 1 year ago

Background

This regression was introduced in open-telemetry/opentelemetry-specification#3402, see comment thread from here: https://github.com/open-telemetry/opentelemetry-specification/pull/3402#issuecomment-1525931734

Description

Previous semantic conventions used the concepts "peer" to indicate the remote end and "host" to indicate the local end of a network connection. It is now no longer generally possible to know which is which, as the new concepts of server/client and source/destination are orthogonal to that.

There is one common case in which the assignment can still be made for server/client: Spans, as long as they don't have INTERNAL as SpanKind can use the rule "if server/consumer kind then server is the local side, otherwise client".

There is also the case of metrics where the metric definition makes it clear whether the local side is the client or server, according to @AlexanderWert https://github.com/open-telemetry/opentelemetry-specification/pull/3402#issuecomment-1537801165. IIUC, this requires a case-by-case knowledge to define a mapping metric ID -> "is client or server the local side".

Besides these two particular cases, it seems now impossible to tell which side is the local vs. remote. Most commonly, this may affect logs, metrics which are not known to the system wanting to know the local/remote end, and any use of source/destination where the remote/local-ness is completely unclear.

Proposal

Introduce a new semantic attribute network.role which may be one of server, client, sender or receiver (closed set). Then the mapping is as follows:

network.role server.* client.* source.* destination.*
server local remote not allowed not allowed
client remote local not allowed not allowed
sender not allowed not allowed local remote
receiver not allowed not allowed remote local

For peer to peer operations, esp. using source/destination, there may be cases where in a single operation that a metric/span/log line is recorded for, both the role of sender and receiver applies. In this case, the role should be arbitrarily chosen (by default, I suggest "sender") and source/destination attributes set consistently to allow correct local/remote attribution. It would also be possible to define an additional role "mixed" with same mapping as "sender" or define that role remains unset and "source" is by default the local end and "destination" the remote end.

Bikeshedding alternative: network.position instead of network.role would work as well, with values corresponding 1:1 to the prefixes that are then local, i.e. one of server/client/source/destination.

CC @lmolkova @trask

lmolkova commented 1 year ago

@Oberon00, thanks for creating the issue!

I wonder if we really need to have a generic way on metrics, events, or logs to tell which side of communication are they on. It'd be nice to have some examples of how backends can use this information before we add an attribute.

The side is frequently implied (when filtering by service name), or irrelevant (when querying specific event or log). Auto-analysis tools would group by common attributes and service name, so the side is again not quite relevant. INTERNAL spans which don't describe network calls, or higher-level metrics also don't necessarily need a side.

It might be that we need some wider semantics that would not apply to network-level roles only (producer, consumer, frontend, device, worker, leader/follower, replica, etc) and we'd regret having network-specific ones.

trask commented 1 year ago

@Oberon00 do you have a specific use-case / need for this? understanding your use-case / need may help us get on the same page, and would also help the HTTP semconv stability WG prioritize it properly. thx!

Oberon00 commented 1 year ago

For now, I don't know of a concrete use case. It's just a regression / information loss I noticed and from a theoretical perspective it seemed interesting enough to write up this issue.

One theoretical use case could be some automated network mapping, where you display which host talks to which other host, based on all the telemetry sources you have.

tigrannajaryan commented 1 year ago

I think this need to move to the new semconv repo.

Oberon00 commented 8 months ago

Actually in the Zipkin exporter spec we need to come up with a remoteEndpoint which was net.peer.* and we describe something similar to the proposed mapping table in the text: https://github.com/open-telemetry/opentelemetry-specification/pull/3794#discussion_r1438673226

It would be nice to have this in a general place.