Open d-mankowski-synerise opened 1 month ago
This is an interesting issue that I don't think we have much experience with internally. We never run Tempo consuming directly from a kafka topic as you have configured.
Does the otel kafka receiver respond to any return that will help? Can we return a special error that will tell it not to advance its cursor for instance b/c the data was not successfully saved?
If there is some feature we can "exploit" we could make a change Tempo side to potentially slow ingestion down and correctly consume a backed up queue.
Do you have any recommendations on what could be the proxy between Kafka and tempo? We are using this in production and one of the huge benefits of using Kafka this way is the buffering it provides - with this bug, this setup doesn't live up to the expectations.
I would recommend using the otel collector:
kafka -> otel collector -> tempo
hopefully, the collector does a better job of draining the queue?
you may also be interested in an upcoming rearchitecture which would add a queue to tempo directly and may allow you to drop your external kafka.
drop your external kafka.
We're using (the same) Kafka cluster as a buffer for logs as well (fluentbit -> kafka -> promtail -> loki) and one of the benefits for having Kafka is that we don't have to pay for the load balancer traffic (i.e. Loki and Tempo are deployed on a different K8s cluster than the apps that send telemetry signals) between OTEL collector and Loki or Tempo (since Kafka driver used by OTEL collector has built-in autodiscovery). Although, I don't think in case of Tempo it would be a significant cost.
We're pushing spans (with OpenTelemetry Collector) to Kafka, and consuming them with Tempo 2.5.0. We scaled down (to 0 replicas) an entire Tempo deployment for ~1h, and during that time a lag for consumer group
otel-collector
(i.e. Tempo distributor) started to grow:When we scaled back the deployment, I expected Tempo to slowly consume all spans/traces from Kafka topic and ingest them to the blob storage. Yet, it seems that Tempo consumed all of them almost instantaneously, and dropped most of them:
due to exceeding limit of live traces:
Logs from tempo-distributor:
I don't see a way to throttle a number of fetched messages in kafkareceiver (as it wouldn't make much sense, though), nor in Tempo's helm chart.
Is there a way to fix this behavior? Of course one option is to bump live traces limit, but IMO it is a workaround, not a proper solution.
Tempo's config (I skipped resource requests/limits, tolerations, node selectors, etc., as it is not important):