jaegertracing / jaeger

CNCF Jaeger, a Distributed Tracing Platform
https://www.jaegertracing.io/
Apache License 2.0
20.22k stars 2.41k forks source link

[Bug]: ElasticSearch storage not deduplicating multiply-archived spans #6001

Open cdanis opened 2 days ago

cdanis commented 2 days ago

What happened?

As an SRE deploying Jaeger at a medium-sized engineering organization (the Wikimedia Foundation), I want to enable trace archival support, for the usual benefits discussed in #57:

Users often ask for this feature. The trace retention period is usually short, and often people would put a link to the trace into some issue tracker to improve service performance, but by the time someone starts working on the ticket the trace has already expired.

However, spans are written to our Elasticsearch(/OpenSearch) archival storage regardless of whether or not they already exist. When archived traces are read from archival storage and displayed, this leads to spans that are duplicated exponentially by however many times 'archive' was clicked. image

We're running Jaeger on Kubernetes using a slight modification of the official Helm chart (if curious see chart and values)

Steps to reproduce

  1. Configure jaeger + OpenSearch similar to ours, with archive tracing
  2. Send some tracing data
  3. Archive the trace
  4. Archive the trace again
  5. Restart jaeger-query with --es.max-span-age younger than the trace is
  6. Refresh the UI page for the trace

Expected behavior

Pressing the 'archive' button on a trace that already exists in archive storage would not mutate anything in archive storage.

Relevant log output

No response

Screenshot

image When expanded, the SpanIDs for these traces show as:

If you examine the OpenSearch index directly, you find that the spans are duplicate documents with everything identical except the index-internal _id field: image

Additional context

No response

Jaeger backend version

v1.56.0

SDK

Envoy sending OpenTelemetry

Pipeline

Envoy -> otelcol -> jaeger-collector

Stogage backend

OpenSearch 2.7.0

Operating system

Linux

Deployment model

Kubernetes

Deployment configs

Args:                                                                                                                                                                   
      --es-archive.enabled=true                                                                                                                                             
      --es-archive.index-date-separator=.                                                                                                                                   
      --es-archive.num-replicas=2                                                                                                                                           
      --es-archive.num-shards=1                                                                                                                                             
      --es-archive.tags-as-fields.all=true                                                                                                                                  
      --es-archive.tls.enabled=true                                                                                                                                         
      --es.index-date-separator=.                                                                                                                                           
      --es.max-span-age=756h0m0s                                                                                                                                            
      --es.num-replicas=2                                                                                                                                                   
      --es.num-shards=1                                                                                                                                                     
      --es.tags-as-fields.all=true                                                                                                                                          
      --es.tls.enabled=true
[...]
    Environment:                                                                                                                                                            
      SPAN_STORAGE_TYPE:       elasticsearch                                                                                                                                
      ES_SERVER_URLS:          https://logs-api.svc.eqiad.wmnet:443                                                                                                         
      ES_USERNAME:             jaeger-prod                                                                                                                                  
      ES_PASSWORD:             <set to the key 'password' in secret 'main-jaeger-elasticsearch'>  Optional: false                                                           
      ES_ARCHIVE_SERVER_URLS:  https://logs-api.svc.eqiad.wmnet:443                                                                                                         
      ES_ARCHIVE_USERNAME:     jaeger-prod                                                                                                                                  
      ES_ARCHIVE_PASSWORD:     <set to the key 'password' in secret 'main-jaeger-elasticsearch'>  Optional: false                                                           
      QUERY_BASE_PATH:         /
yurishkuro commented 2 days ago

Jaeger is supposed to de-duplicate these spans on read.

cdanis commented 2 days ago

Jaeger is supposed to de-duplicate these spans on read.

The issue might be that we only enabled tags-as-fields.all=true on both the main and archive ES configurations partway into our deployment? For at least two of these traces, they're written in archive storage "both ways": image

yurishkuro commented 2 days ago

I think it's expected that the duplicate spans will be written to storage - we don't do read before write, since it's never needed for primary storage and archive storage is the same implementation. In case of Cassandra we actually create a primary key based on span ID and content hash, so dups are handled by the storage. IIRC the ES implementation does not assign PK (which I assume would cause sharding decisions in ES and potentially another RPC from coordinator node to the required shared), so each span document gets a unique ES-internal ID and duplicates would exist if writing more than once. Once again, I think it's an acceptable behavior for archive storage because the query layer is supposed to dedupe spans with the same ID (by merging them). But it doesn't seem to be happening, which is a bug.

cdanis commented 1 day ago

Sorry, I was being unclear -- I understand duplicate documents are expected, and I agree that seems like fine behavior. But I was trying to point out that enabling tags-as-fields.all=true after having archiving traces without it enabled might be an unusual corner case for the dedup/merge logic.

yurishkuro commented 1 day ago

I don't think it should matter. I expect deduping of spans to be done by query code, not specific storage implementation. So that flag affects the internal storage representation, but once it's read into the domain data model there shouldn't be any difference for the deduping logic.

cdanis commented 8 hours ago

Is the query code you're talking about the SpanIDDeduper Adjuster?

// SpanIDDeduper returns an adjuster that changes span ids for server
// spans (i.e. spans with tag: span.kind == server) if there is another
// client span that shares the same span ID. This is needed to deal with
// Zipkin-style clients that reuse the same span ID for both client and server
// side of an RPC call. Jaeger UI expects all spans to have unique IDs.
[...]
func (d *spanIDDeduper) dedupeSpanIDs() {
    oldToNewSpanIDs := make(map[model.SpanID]model.SpanID)
    for _, span := range d.trace.Spans {
        // only replace span IDs for server-side spans that share the ID with something else
        if span.IsRPCServer() && d.isSharedWithClientSpan(span.SpanID) {

We mostly don't have client spans with things as we've deployed them now ... and I don't think this would merge two server spans anyway? It would just re-ID them?

Apologies if I'm missing something, this is my second time ever looking at the codebase itself :)

yurishkuro commented 8 hours ago

Yes, this one is specific for old school Zipkin spans. I need a real deduper.

cdanis commented 8 hours ago

Oh jeez lol, I'm sorry.

It's only just now that I understood your very first "supposed to" as a normative statement, and not as saying that this functionality already exists but isn't working in my case.

cdanis commented 8 hours ago

I did a first draft: https://github.com/jaegertracing/jaeger/compare/main...cdanis:jaeger:dupe-dedupe

I'll test this out and clean it up and attempt sending a PR next week? Or if someone else can do it sooner that's fine too