open-telemetry / opentelemetry-python

OpenTelemetry Python API and SDK
https://opentelemetry.io
Apache License 2.0
1.82k stars 635 forks source link

Duplicate Span, sometimes #3275

Open Liubey opened 1 year ago

Liubey commented 1 year ago

Steps to reproduce When I use opentelemetry in python3, sometimes I got duplicate span . these code used to create tracer:

        from opentelemetry import trace, version
        from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
        from opentelemetry.sdk.resources import Resource
        from opentelemetry.sdk.trace import TracerProvider
        from opentelemetry.sdk.trace.export import BatchSpanProcessor, SimpleSpanProcessor, ConsoleSpanExporter
        from opentelemetry.propagate import set_global_textmap
        from opentelemetry.propagators.b3 import B3Format
        set_global_textmap(B3Format())
        os.environ.setdefault(u'OTEL_SPAN_ATTRIBUTE_VALUE_LENGTH_LIMIT', u'10240')
        resource = Resource(attributes={
            PROCESS_IP_TAG_KEY: local_ip(),
            PROCESS_HOSTNAME_TAG_KEY: host_name(),
            PROCESS_SERVICE_NAME_TAG_KEY: service_name,
            PROCESS_SDK_VERSION_TAG_KEY: u'Opentelemetry-Python-' + version.__version__
        })
        # https://opentelemetry.io/docs/instrumentation/python/exporters/
        trace.set_tracer_provider(TracerProvider(resource=resource))
        otlp_exporter = OTLPSpanExporter(endpoint=u'http://localhost:6831', insecure=True)
        span_processor = BatchSpanProcessor(otlp_exporter)
        trace.get_tracer_provider().add_span_processor(span_processor)
        if is_tracing_print_log():
            trace.get_tracer_provider().add_span_processor(SimpleSpanProcessor(ConsoleSpanExporter()))

        tracer_implements = trace.get_tracer(__name__)

this code used to create span:

        from opentelemetry import trace, baggage
        exist_active_span = trace.get_current_span()
        span = self.tracer_implements.start_span(name, context=parent_span_context)
        span.set_attribute(TAG_PRODUCT_FLAG, product_flag)
        scope = trace.use_span(span, end_on_exit=True)
        setattr(scope, u'__span__', span)
        scope.__enter__()
        return scope

this code used to end span:

        current_scope.__exit__(None, None, None)

Something that needs to be said IT IS NOT 100%, when I restart my server It may be disappeared. But Something when I restart my server again, It may be appear. I swear It's duplicate by report step. cause console print twice already. In Jaeger It looks like this:

image

What is the expected behavior? What happened?

What is the actual behavior?

Additional context

ocelotl commented 1 year ago

Is there a way for you to check if the span IDs are duplicated?

marctc commented 1 year ago

I can confirm that span and traces id are duplicated. I'm testing this connector sending traces from a Django application that use last otel instrumentation library (0.39) and I receive this:

2023-07-18T12:18:01.543Z    debug   exceptionsconnector/connector_metrics.go:99checking trace/span  {"kind": "exporter", "name": "exceptions", "exporter_in_pipeline": "traces", "receiver_in_pipeline": "metrics", "trace_id": "2bc89f693a428f16e44f9a1e63e3bc50", "span_id": "5a507db467f88147"}
2023-07-18T12:18:01.545Z    debug   exceptionsconnector/connector_metrics.go:99checking trace/span  {"kind": "exporter", "name": "exceptions", "exporter_in_pipeline": "traces", "receiver_in_pipeline": "metrics", "trace_id": "2bc89f693a428f16e44f9a1e63e3bc50", "span_id": "5a507db467f88147"}

With previous versions (last time i tried it was 0.26) I didn't have this issue.

marctc commented 1 year ago

Doing some testing I found that opentelemetry v1.12.0rc2 & v0.32b0 was the first release that includes this bug.

lzchen commented 1 year ago

@Liubey @marctc There has been a few releases since this issue was created. Can you confirm whether you are still getting duplicate spans?

devdut1999 commented 1 year ago

Hey. I'm also getting a similar issue. I'm seeing duplicate spans.

aabmass commented 1 year ago

This seems really hard to reproduce and the culprit could be in several different components. If someone could provide a full repro with a docker compose yaml setup, and pinned python dependencies, we could definitely investigate.

marctc commented 1 year ago

@Liubey @marctc There has been a few releases since this issue was created. Can you confirm whether you are still getting duplicate spans?

I can confirm the problem still persists :shrug:

marctc commented 1 year ago

This seems really hard to reproduce and the culprit could be in several different components. If someone could provide a full repro with a docker compose yaml setup, and pinned python dependencies, we could definitely investigate.

this is my docker-compose:

version: "2"
services:
  api:
    depends_on:
      - otel-collector
    build: ./service/api/.
    command: opentelemetry-instrument uwsgi --http :8000 --module faulty.wsgi --master
    ports:
      - "8000:8000"
    environment:
      - DJANGO_SETTINGS_MODULE=api.settings
      - OTEL_RESOURCE_ATTRIBUTES=service.name=api
      - OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4317
      - PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python
      - OTEL_EXPORTER_OTLP_INSECURE=true

  otel-collector:
    build: ./otel-collector/.
    ports:
      - "6831:6831"
      - "14268:14268"
      - "4317:4317"
      - "4318:4318"
    volumes:
      - ./config/otel-collector.yaml:/config/otel-collector.yaml
    command: /bin/otelcol --config=/config/otel-collector.yaml

wsgi.py content:

import os

from django.core.wsgi import get_wsgi_application
from opentelemetry import trace
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor, ConsoleSpanExporter
from uwsgidecorators import postfork

os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'faulty.settings')

application = get_wsgi_application()

@postfork
def init_tracing():
    trace.set_tracer_provider(TracerProvider())
    trace.get_tracer_provider().add_span_processor(BatchSpanProcessor(
        OTLPSpanExporter(endpoint=os.environ.get('OTEL_EXPORTER_OTLP_ENDPOINT'))
    ))
    trace.get_tracer_provider().add_span_processor(BatchSpanProcessor(
        ConsoleSpanExporter()
    ))

requirements.txt content

opentelemetry-exporter-otlp==1.20.0
opentelemetry-instrumentation-django==v0.41b0
opentelemetry-distro==v0.41b0
Django==4.1
uWSGI==2.0.21
aabmass commented 1 year ago

@marctc this example is missing Dockerfiles and django app? A github gist or repo I can clone and just run docker compose up would be easiest.

marctc commented 1 year ago

@marctc this example is missing Dockerfiles and django app? A github gist or repo I can clone and just run docker compose up would be easiest.

there you go https://github.com/marctc/django-otel-demo

aabmass commented 1 year ago

@marctc you are adding two OTLPSpanExporters to the TracerProvider which is causing duplicates. I added this to your wsgi.py

diff --git a/service/faulty/faulty/wsgi.py b/service/faulty/faulty/wsgi.py
index 82eacb9..997d833 100644
--- a/service/faulty/faulty/wsgi.py
+++ b/service/faulty/faulty/wsgi.py
@@ -31,3 +31,7 @@ def init_tracing():
     trace.get_tracer_provider().add_span_processor(BatchSpanProcessor(
         ConsoleSpanExporter()
     ))
+
+    print(
+        [sp.span_exporter for sp in trace.get_tracer_provider()._active_span_processor._span_processors]
+    )

and it prints out

[<opentelemetry.exporter.otlp.proto.grpc.trace_exporter.OTLPSpanExporter object at 0x7fba8dc19150>, <opentelemetry.exporter.otlp.proto.grpc.trace_exporter.OTLPSpanExporter object at 0x7fba74263e50>, <opentelemetry.sdk.trace.export.ConsoleSpanExporter object at 0x7fba744733d0>]

If you look at the Python startup logs, you'll see the warning Overriding of current TracerProvider is not allowed. What's happening is opentelemetry-instrument sets up the global tracer provider for you already with an OTLP exporter (the default). Then in wsgi.py you're adding trace.set_tracer_provider(TracerProvider()) has no effect (because you can't override the global once set) and you end up just adding more exporters to the existing TracerProvider.

You should only use one of these two options for setting up your SDK:

  1. Use opentelemetry-instrument with appropriate envvars or command line flags
  2. Create a TracerProvider in your code and set it as global with opentelemetry.trace.set_tracer_provider()
Liubey commented 1 year ago

@aabmass But I NEVER use opentelemetry-instrument. And It's difficulty to reproduce. But I will notice the log 'Overriding of current TracerProvider is not allowed'

aabmass commented 1 year ago

The repro you provided is using it https://github.com/marctc/django-otel-demo/blob/9a3e1d614fda963578ff752db524283562520252/docker-compose.yaml#L7

bazingarj commented 3 months ago

We are Observing the same Issue, sometimes it is 20x, 8x , 2x duplication. Its pretty random and ruining the traces. Collector Version: 0.100.0