Open Liubey opened 1 year ago
Is there a way for you to check if the span IDs are duplicated?
I can confirm that span and traces id are duplicated. I'm testing this connector sending traces from a Django application that use last otel instrumentation library (0.39) and I receive this:
2023-07-18T12:18:01.543Z debug exceptionsconnector/connector_metrics.go:99checking trace/span {"kind": "exporter", "name": "exceptions", "exporter_in_pipeline": "traces", "receiver_in_pipeline": "metrics", "trace_id": "2bc89f693a428f16e44f9a1e63e3bc50", "span_id": "5a507db467f88147"}
2023-07-18T12:18:01.545Z debug exceptionsconnector/connector_metrics.go:99checking trace/span {"kind": "exporter", "name": "exceptions", "exporter_in_pipeline": "traces", "receiver_in_pipeline": "metrics", "trace_id": "2bc89f693a428f16e44f9a1e63e3bc50", "span_id": "5a507db467f88147"}
With previous versions (last time i tried it was 0.26) I didn't have this issue.
Doing some testing I found that opentelemetry v1.12.0rc2 & v0.32b0 was the first release that includes this bug.
@Liubey @marctc There has been a few releases since this issue was created. Can you confirm whether you are still getting duplicate spans?
Hey. I'm also getting a similar issue. I'm seeing duplicate spans.
This seems really hard to reproduce and the culprit could be in several different components. If someone could provide a full repro with a docker compose yaml setup, and pinned python dependencies, we could definitely investigate.
@Liubey @marctc There has been a few releases since this issue was created. Can you confirm whether you are still getting duplicate spans?
I can confirm the problem still persists :shrug:
This seems really hard to reproduce and the culprit could be in several different components. If someone could provide a full repro with a docker compose yaml setup, and pinned python dependencies, we could definitely investigate.
this is my docker-compose:
version: "2"
services:
api:
depends_on:
- otel-collector
build: ./service/api/.
command: opentelemetry-instrument uwsgi --http :8000 --module faulty.wsgi --master
ports:
- "8000:8000"
environment:
- DJANGO_SETTINGS_MODULE=api.settings
- OTEL_RESOURCE_ATTRIBUTES=service.name=api
- OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4317
- PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python
- OTEL_EXPORTER_OTLP_INSECURE=true
otel-collector:
build: ./otel-collector/.
ports:
- "6831:6831"
- "14268:14268"
- "4317:4317"
- "4318:4318"
volumes:
- ./config/otel-collector.yaml:/config/otel-collector.yaml
command: /bin/otelcol --config=/config/otel-collector.yaml
wsgi.py
content:
import os
from django.core.wsgi import get_wsgi_application
from opentelemetry import trace
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor, ConsoleSpanExporter
from uwsgidecorators import postfork
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'faulty.settings')
application = get_wsgi_application()
@postfork
def init_tracing():
trace.set_tracer_provider(TracerProvider())
trace.get_tracer_provider().add_span_processor(BatchSpanProcessor(
OTLPSpanExporter(endpoint=os.environ.get('OTEL_EXPORTER_OTLP_ENDPOINT'))
))
trace.get_tracer_provider().add_span_processor(BatchSpanProcessor(
ConsoleSpanExporter()
))
requirements.txt
content
opentelemetry-exporter-otlp==1.20.0
opentelemetry-instrumentation-django==v0.41b0
opentelemetry-distro==v0.41b0
Django==4.1
uWSGI==2.0.21
@marctc this example is missing Dockerfiles and django app? A github gist or repo I can clone and just run docker compose up
would be easiest.
@marctc this example is missing Dockerfiles and django app? A github gist or repo I can clone and just run
docker compose up
would be easiest.
there you go https://github.com/marctc/django-otel-demo
@marctc you are adding two OTLPSpanExporters to the TracerProvider which is causing duplicates. I added this to your wsgi.py
diff --git a/service/faulty/faulty/wsgi.py b/service/faulty/faulty/wsgi.py
index 82eacb9..997d833 100644
--- a/service/faulty/faulty/wsgi.py
+++ b/service/faulty/faulty/wsgi.py
@@ -31,3 +31,7 @@ def init_tracing():
trace.get_tracer_provider().add_span_processor(BatchSpanProcessor(
ConsoleSpanExporter()
))
+
+ print(
+ [sp.span_exporter for sp in trace.get_tracer_provider()._active_span_processor._span_processors]
+ )
and it prints out
[<opentelemetry.exporter.otlp.proto.grpc.trace_exporter.OTLPSpanExporter object at 0x7fba8dc19150>, <opentelemetry.exporter.otlp.proto.grpc.trace_exporter.OTLPSpanExporter object at 0x7fba74263e50>, <opentelemetry.sdk.trace.export.ConsoleSpanExporter object at 0x7fba744733d0>]
If you look at the Python startup logs, you'll see the warning Overriding of current TracerProvider is not allowed
. What's happening is opentelemetry-instrument
sets up the global tracer provider for you already with an OTLP exporter (the default). Then in wsgi.py you're adding trace.set_tracer_provider(TracerProvider())
has no effect (because you can't override the global once set) and you end up just adding more exporters to the existing TracerProvider.
You should only use one of these two options for setting up your SDK:
opentelemetry-instrument
with appropriate envvars or command line flagsopentelemetry.trace.set_tracer_provider()
@aabmass But I NEVER use opentelemetry-instrument. And It's difficulty to reproduce. But I will notice the log 'Overriding of current TracerProvider is not allowed'
The repro you provided is using it https://github.com/marctc/django-otel-demo/blob/9a3e1d614fda963578ff752db524283562520252/docker-compose.yaml#L7
We are Observing the same Issue, sometimes it is 20x, 8x , 2x duplication. Its pretty random and ruining the traces. Collector Version: 0.100.0
Steps to reproduce When I use opentelemetry in python3, sometimes I got duplicate span . these code used to create tracer:
this code used to create span:
this code used to end span:
Something that needs to be said IT IS NOT 100%, when I restart my server It may be disappeared. But Something when I restart my server again, It may be appear. I swear It's duplicate by report step. cause console print twice already. In Jaeger It looks like this:
What is the expected behavior? What happened?
What is the actual behavior?
Additional context