open-telemetry / opentelemetry-python

OpenTelemetry Python API and SDK
https://opentelemetry.io
Apache License 2.0
1.72k stars 607 forks source link

AssertionError in OtelBatchSpanProcessor #4025

Open florianschieder opened 2 months ago

florianschieder commented 2 months ago

Describe your environment

OS: debian bookworm Python version: Python 3.12.4 SDK version: 1.25.0 API version: 1.25.0 Gevent version: 24.2.1 Gunicorn version: 22.0.0

What happened?

We are using OpenTelemetry in a Python based webservice, using gevent and gunicorn.

The OtelBatchSpanProcessor background thread failed with the following stack trace:

Traceback (most recent call last): File "/usr/local/lib/python3.12/threading.py", line 1073, in _bootstrap_inner self.run() File "/usr/local/lib/python3.12/threading.py", line 1010, in run self._target(*self._args, **self._kwargs) File "/usr/local/lib/python3.12/site-packages/opentelemetry/sdk/trace/export/__init__.py", line 263, in worker self.condition.wait(timeout) File "/usr/local/lib/python3.12/threading.py", line 364, in wait self._acquire_restore(saved_state) File "/usr/local/lib/python3.12/threading.py", line 312, in _acquire_restore self._lock.acquire() # Ignore saved state File "/usr/local/lib/python3.12/site-packages/gevent/thread.py", line 112, in acquire acquired = BoundedSemaphore.acquire(self, blocking, timeout) File "src/gevent/_semaphore.py", line 184, in gevent._gevent_c_semaphore.Semaphore.acquire File "src/gevent/_semaphore.py", line 276, in gevent._gevent_c_semaphore.Semaphore.acquire AssertionError: (0, True, None, True)

Meanwhile the following stack trace (of the HTTP request) was logged: File (part of our source code), line 25, in instrumented_func with self._tracer.start_as_current_span( File "/usr/local/lib/python3.12/contextlib.py", line 144, in __exit__ next(self.gen) File "/usr/local/lib/python3.12/site-packages/opentelemetry/sdk/trace/__init__.py", line 1085, in start_as_current_span with trace_api.use_span( File "/usr/local/lib/python3.12/contextlib.py", line 144, in __exit__ next(self.gen) File "/usr/local/lib/python3.12/contextlib.py", line 144, in __exit__ next(self.gen) File "/usr/local/lib/python3.12/site-packages/opentelemetry/trace/__init__.py", line 597, in use_span span.end() File "/usr/local/lib/python3.12/site-packages/opentelemetry/sdk/trace/__init__.py", line 934, in end self._span_processor.on_end(self._readable_span()) File "/usr/local/lib/python3.12/site-packages/opentelemetry/sdk/trace/__init__.py", line 168, in on_end sp.on_end(span) File "/usr/local/lib/python3.12/site-packages/opentelemetry/sdk/trace/export/__init__.py", line 234, in on_end with self.condition: File "/usr/local/lib/python3.12/threading.py", line 303, in __exit__ return self._lock.__exit__(*args) File "src/gevent/_semaphore.py", line 285, in gevent._gevent_c_semaphore.Semaphore.__exit__ File "src/gevent/_semaphore.py", line 286, in gevent._gevent_c_semaphore.Semaphore.__exit__ File "src/gevent/_semaphore.py", line 494, in gevent._gevent_c_semaphore.BoundedSemaphore.release File "src/gevent/_semaphore.py", line 494, in gevent._gevent_c_semaphore.BoundedSemaphore.release File "src/gevent/_semaphore.py", line 500, in gevent._gevent_c_semaphore.BoundedSemaphore.release RuntimeError: Semaphore released too many times

Note: I'm not certain if this is some issue which should be filed against opentelemetry-python or gevent. I first wanted to address it here, since it seems like there is a conflict with the background thread.

Please let me know if there is anything further you need to know.

Steps to Reproduce

I cannot deliver any steps to reproduce as I think this is some sort of race condition.

Expected Result

invoking Tracer.start_as_current_span() does not fail

Actual Result

invoking Tracer.start_as_current_span() did sporadically fail (RuntimeError: Semaphore released too many times)

Additional context

No response

Would you like to implement a fix?

None

pmcollins commented 2 months ago

Hi -- you may be running in to a known issue with running the batch span processor on gunicorn, for which there's this workaround: https://opentelemetry-python.readthedocs.io/en/latest/examples/fork-process-model/README.html