jaegertracing / jaeger

CNCF Jaeger, a Distributed Tracing Platform
https://www.jaegertracing.io/
Apache License 2.0
20.19k stars 2.4k forks source link

[Feature]: Add more realistic scenario to large trace integration test #3912

Closed alejandrodnm closed 2 years ago

alejandrodnm commented 2 years ago

Requirement

As a developer working on automatically running the TestExternalGRPCStorage tests against a custom storage using the gRPC plugin.

I want a more realistic scenario for the testGetLargeSpan test with each span having a unique spanID.

So that I can have more confidence on the results of the integration test.

Problem

The testGetLargeSpans test writes 10008 spans all belonging to the same trace and then makes a request to retrieve the trace. The problem is that the test [s just sending the same span repeatedly:

https://github.com/jaegertracing/jaeger/blob/c1bb2946e670129314bf88d9730d8ed7566766d4/plugin/storage/integration/integration_test.go#L273-L283

Thus, all spans have the same spanID and attributes.

From the fixture file:

https://github.com/jaegertracing/jaeger/blob/c1bb2946e670129314bf88d9730d8ed7566766d4/plugin/storage/integration/fixtures/traces/example_trace.json#L3-L25

This has implications on the storage I'm using to run the test against given the way we define uniqueness among spans.

Proposal

I was doing some research on spanID and found the following.

The OpenTelemetry spec defines the SpanID as:

SpanId is the identifier for a span. It is globally unique with practically sufficient probability by being made as 8 randomly generated bytes. When passed to a child Span this identifier becomes the parent span id for the child Span

The Jaeger Thrift model also states something about spanID uniqueness:

3: required i64 spanId # unique span id (only unique within a given trace)

Also, Jaeger has a depuper for non unique spanIDs. In its documentation it say:

Jaeger UI expects all spans to have unique IDs.

https://github.com/jaegertracing/jaeger/blob/c1bb2946e670129314bf88d9730d8ed7566766d4/model/adjuster/span_id_deduper.go#L24-L28

Even though I couldn't find an official spec for Jaeger spans that says that spanID should be unique or unique withing a trace, it seems to be the agreed upon or desired constraint.

So, where am I going with all of this? Would it be ok to change this test to assign a unique spanID to each span? This would provide a more realistic test scenario, unless I'm missing something. Could be something like:

@@ -277,6 +282,7 @@ func (s *StorageIntegration) loadParseAndWriteLargeTrace(t *testing.T) *model.Tr
        for i := 1; i < 10008; i++ {
                s := new(model.Span)
                *s = *span
+               s.SpanID = model.SpanID(i)
                s.StartTime = s.StartTime.Add(time.Second * time.Duration(i+1))
                trace.Spans = append(trace.Spans, s)
        }

Thanks.

Open questions

yurishkuro commented 2 years ago

Span IDs should be generally unique, it looks like a bug in the test. However, jaeger backends are expected to handle duplicate ids with overwriting the data.

alejandrodnm commented 2 years ago

Hi @yurishkuro, thanks for the reply.

Span IDs should be generally unique, it looks like a bug in the test. However, jaeger backends are expected to handle duplicate ids with overwriting the data.

Since you considered it a bug, is it ok I go ahead with the PR to add unique spanID's?

Part of the problem is that the expected trace list of spans is the same span repeated, if I overwrite in the storage the test fails. because I'll be comparing on slice of 10008 spans against a slice with just one span.

yurishkuro commented 2 years ago

yes, feel free to send a PR

Monnoroch commented 1 year ago

I believe the problem is not fully fixed. PTAL: https://github.com/jaegertracing/jaeger/issues/4466#issuecomment-1563278469.