open-telemetry / opentelemetry-python

OpenTelemetry Python API and SDK
https://opentelemetry.io
Apache License 2.0
1.67k stars 570 forks source link

OtelBatchSpanProcessor threading KeyError #3884

Open frx08 opened 2 months ago

frx08 commented 2 months ago

Environment:

ubuntu:22.04
python:3.10

libs installed:

...
opentelemetry-api==1.24.0
opentelemetry-sdk==1.24.0
opentelemetry-exporter-gcp-trace==1.6.0
...

app.py

from opentelemetry import trace
from opentelemetry.exporter.cloud_trace import CloudTraceSpanExporter
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor

tracer_provider = TracerProvider()
trace.set_tracer_provider(tracer_provider)
gcp_exporter = CloudTraceSpanExporter()
span_processor = BatchSpanProcessor(gcp_exporter)
tracer_provider.add_span_processor(span_processor)

running with: gunicorn --worker-class geventwebsocket.gunicorn.workers.GeventWebSocketWorker -w 1 --threads 20 app:app

I'm running my service on gcloud app engine with this configuration but many times when I create a new span I get the following exception and the service is restarted (i'm using supervisord)

Traceback (most recent call last):    File "src/gevent/greenlet.py", line 908, in gevent._gevent_cgreenlet.Greenlet.run    File "/usr/lib/python3.10/threading.py", line 973, in _bootstrap
    self._bootstrap_inner()
  File "/usr/lib/python3.10/threading.py", line 1008, in _bootstrap_inner
    del _limbo[self]
KeyError: <Thread(OtelBatchSpanProcessor, stopped daemon 139781863898496)>
2024-04-26T18:20:34Z <Greenlet at 0x7f218057fd80: <bound method Thread._bootstrap of <Thread(OtelBatchSpanProcessor, stopped daemon 139781863898496)>>> failed with KeyError
[2024-04-26 18:20:35 +0000] [8] [ERROR] Worker (pid:12) was sent code 139!
2024-04-26 18:20:35,177 INFO reaped unknown pid 164 (exit status 0)
2024-04-26 18:20:35,179 INFO reaped unknown pid 147 (exit status 0)
2024-04-26 18:20:35,179 INFO reaped unknown pid 173 (exit status 0)

do you know how I can address this issue? As you can see I'm using gevent==23.9.1

xrmx commented 2 months ago

If you use gevent you should monkey patch everything you use, and I don't see it from your code

frx08 commented 2 months ago

sorry I didn't include it, yes i'm patching everything at the very start of my script

from gevent import monkey
monkey.patch_all()