open-telemetry / opentelemetry-specification

Specifications for OpenTelemetry
https://opentelemetry.io
Apache License 2.0
3.75k stars 889 forks source link

Unique request id and other packet metadata in OTLP #2862

Open alexander-shepel opened 2 years ago

alexander-shepel commented 2 years ago

We are developing our own system to accept metrics data in OTLP format and would like to have some means to troubleshoot customer issues related to missing/duplicate/slowly delivered metrics packets. There are several problems we would like to address here:

Therefore we suggest to add metadata in the OTLP request that would at a minimum allow clients to send this information with each packet:

This metadata should be cumulative, i.e. each application load balancer / gateway aware of OTLP traffic should add its own entry with the metadata above. If OTLP packets are merged or split by the gateway, this will still allow the server to track where final ingestion delay comes from for each metric in a packet.

tigrannajaryan commented 2 years ago

Unique request / packet id. The format and contents of this packet should be defined by the client, it should be unique per the blob of data inside. This will allow server to detect (and optionally reject) duplicates.

This was discussed and rejected in the past due to problems with intermediaries.

I am open to discussing this again, but at the very least we need to address the questions that were raised in the past.

Retry delay. This is a difference between the oldest metric timestamp in a packet and the time when request was sent to the server.

Why does this need to be part of the protocol? It seems that the server can easily compute this from the request (assuming roughly that the "time sent"="time received").

If OTLP packets are merged or split by the gateway, this will still allow the server to track where final ingestion delay comes from for each metric in a packet.

Unclear how to maintain this cumulatively in the face of batching, splitting and other transformations that can happen in intermediaries.

tigrannajaryan commented 2 years ago

Retry attempt. This is useful to understand how bad the network connection is between client and server (or gateway).

Alternatively, instead of making this a protocol feature, make it an observable metric of the client.

tigrannajaryan commented 2 years ago

cc @jmacd @bogdandrutu

jmacd commented 2 years ago

At Lightstep, we use a (hash of the) combination of all identifying fields, including for example metric datapoint timestamps, to compute an "idempotency key" for OTLP data. We use this information to avoid repeated application of metric data, for example. Although I also have (in the past) proposed adding a unique request ID, I no longer believe it's needed.

alexander-shepel commented 2 years ago

@tigrannajaryan, thank you for the quick review of this issue. Before I address your comments, I want to mention that my proposal comes from experience with an existing large metrics system, where similar concepts are already implemented in our own flow between client and server, but as we are looking to expand into OTLP world, we would like to have similar capabilities / SLA guarantees for the clients that will use OTLP instead of our own protocol and/or agent. These capabilities on a high-level help us solve two business problems:

This was discussed and rejected in the past due to problems with intermediaries.

I am open to discussing this again, but at the very least we need to address the questions that were raised in the past.

I've reviewed that proposal and our use case is a bit different:

So the main use case for us here is to help ourselves help our customers: be able to research if data was never delivered or somehow received twice. We do have our own custom ingestion agent running on a client, which helps here (we can see when it e.g. reports connection timeout), but in agent-less scenario this header becomes even more important.

Also, this header is "cumulative" in a way that we have our own gateway (intermediary) facing the Internet and this gateway adds its own unique id, retry attempt and delay while preserving the same info from the client agent or SDK. This approach somewhat resembles distributed tracing.

Why does this need to be part of the protocol? It seems that the server can easily compute this from the request (assuming roughly that the "time sent"="time received").

There are two reasons for this:

Unclear how to maintain this cumulatively in the face of batching, splitting and other transformations that can happen in intermediaries.

Apologies, I think I wasn't clear here: the idea is that intermediary doesn't repackage OTLP (or any other) packets, it understands and writes only the "batch" format where each packet is a blob (black box) + some metadata on top of it including unique id and retry information. Opening the packets and repackaging them on intermediary would be too expensive.

Alternatively, instead of making this a protocol feature, make it an observable metric of the client.

In this case if the client has already serialized a packet and 1st attempt failed, it will need to serialize it again with the new "retry attempt" = 2 value. Sending retry attempt as e.g. a header is cheaper in this case (our agents send lots of metrics and using less CPU and memory on the client is important to us). Also, considering we could have an intermediary that doesn't usually open or even understand the format of the packets, we could only have the same information from the the intermediary outside the packet - it is simpler to have it outside the packet in both cases.

I hope this long-read clarifies where I am coming from with this proposal. If there are other ways OTLP can help solve original business problems above - I am willing to also consider those.

tigrannajaryan commented 2 years ago

the idea is that intermediary doesn't repackage OTLP (or any other) packets, it understands and writes only the "batch" format where each packet is a blob (black box) + some metadata on top of it including unique id and retry information. Opening the packets and repackaging them on intermediary would be too expensive.

@alexander-shepel The most important OpenTelemetry's intermediary is OpenTelemetry Collector. The Collector deserializes the payloads and does all sorts of transformation to that the payload, then serializes again before sending. There is no good way to preserve the original request metadata in the Collector. It is unclear how your proposal will work with the Collector.

If you want to make a more elaborate proposal about the features you want to add we will be happy to review it, but please be aware that the Collector is an important use case for us and features that don't work well with the Collector will be difficult or impossible to get accepted.

Either way we will need a proper design document to review, authored by your or anyone else who wants these capabilities. I believe this is a non-trivial proposal that requires an OTEP (see https://github.com/open-telemetry/oteps).

Please make sure to address the following in the OTEP:

Also the 3 capabilities that you want don't seem to be interdependent, so they can be proposed as 3 different OTEPs that can be reviewed and accepted/rejected individually (unless there is some unifying underlying mechanism that requires them to be added at the same time).

jmacd commented 2 years ago

@alexander-shepel I support the basic claim that without a request identifier in OTLP, that implementing correct at-most-once aggregation logic on the write path is fundamentally expensive. Lightstep would prefer if it did not have to compute such an "idempotency key" as we are currently doing.

@tigrannajaryan points out why this is a challenge for OTel. I want to point out that there are solutions here, it's just that they are also expensive. If an OTel collector could be configured with an "idempotency processor" that would ensure at-most-once delivery of OTLP data into its output pipeline, then the output request can use an entirely new request ID.

However, if the collector is configured with certain transformations and it does not ensure at-most-once semantics for its input, then it's easy to break the mechanism that Lightstep is using for idempotency itself breaking. I think the path forward for adding a request ID in OTel is to first clarify how to achieve at-most-once delivery with OTLP data.

alexander-shepel commented 2 years ago

@tigrannajaryan, @jmacd, thank you for your responses!

I understand that the proposal should focus on the Collector, so I was thinking how it could be generalized to support the Collector (or a chain of Collectors in a complex metric streams merge-split scenario) and here is a high-level idea, which I would like to get some quick feedback on before I go the OTEP way.

The core of the idea is to make Collector report latency metric about itself (I know this happens already), and also make it add this "standard" latency metric spans to each exported packet. Additionally, if the "standard" latency metric spans are present in the incoming packet, Collector could aggregates multiple incoming streams (where necessary), adds its own spans and export them further. I looked at the Collector documentation found this:

End-to-end latency (from receiver input to exporter output). Note that with multiple receivers/exporters we potentially have NxM data paths, each with different latency (plus different pipelines in the future), so realistically we should likely expose the average of all data paths (perhaps broken down by pipeline).

I think it is close to what I suggest, however, I couldn't find this metric in the Collector implementation, therefore let me assume how this latency could have been recorded and which attributes it might have:

Let each Collector add two spans to the "standard" latency metric of type Sum (assume delta temporality):

  1. Span about "in-flight" time of this packet (added during parsing of the incoming packet):

    1. StartTimeUnixNano = TimeUnixNano of the last span of the same metric in the incoming OTLP packet (or last span of any metric in this packet if latency metric is not present - should be the case for the first Collector in the processing chain).
    2. TimeUnixNano = current time.
    3. Value = 1.
    4. Attributes: prior_collector_id and prior_packet_id (where available as collector_id and packet_id attributes from previous span in the incoming packet), collector_id, type = "in-flight".
    5. In case Collector merges several streams (packets), it might either merge the spans of this metric from all packets together and also add its won span where values in the "prior_..." attributes are merged from all incoming packets as well (though I am not sure if this is going to result in a valid set of metric spans) or it may aggregate them somehow (in the very simple case just create a single "aggregate" span representing all prior steps). This is where more elaboration is required.
  2. Span about "processing" time of this packet (added in exporter):

    1. StartTimeUnixNano = TimeUnixNano of the span above.
    2. TimeUnixNano = current time.
    3. Value = 1.
    4. Attributes: collector_id, packet_id (whatever exporter implementation wants to use as id - hash, GUID, etc), type = "processing".
    5. In case there are several exporters, they will all send a copy of the spans of the latency metric to different destinations.

With this approach this "standard" latency metric becomes a first-class citizen in the target system (i.e. yet another metric), but it will also allow for some automation around it if all Collectors 1) implement such metric and 2) implement it in the same way and 3) export it out by default.

As for our specific asks, this solves SLA problem, as this metric will be the source of information about each Collector latency and depending on where the boundary between customer's and our own set of Collectors is, we could distinguish customer vs our own processing time. This approach may also be used to track duplicate packets (for example, Collector can track all "packet_id"s encountered during certain sliding time time in all latency metric spans, and report error when it sees the same id twice), but this is not a goal specifically for us - we will only record the "packet_id" that crosses the boundary between customer and our system.