open-telemetry / opentelemetry-python

OpenTelemetry Python API and SDK
https://opentelemetry.io
Apache License 2.0
1.81k stars 633 forks source link

Exception while exporting Span batch. #3808

Open kuza55 opened 8 months ago

kuza55 commented 8 months ago

Describe your environment Ubuntu 22.04 in Docker Python 3.11 opentelemetry-api/sdk 1.23.0

Steps to reproduce I had been running into issues where spans created inside a forked process using multiprocessing.Process were being dropped. Specifically, I was using the wrapt_timeout_decorator library to add timeouts around some sync python code.

After debugging it, I realized that adding some code like:

def exit_gracefully(signum=None, frame=None):
    trace.get_tracer_provider().force_flush()

...

@timeout(1.8, use_signals=False)
def slow_func(self):
  signal.signal(signal.SIGTERM, exit_gracefully)
  try:
    do_stuff()
  finally:
    exit_gracefully()

Got the spans inside do_stuff to be exported successfully most of the time (Though not all of the time).

What is the expected behavior? No exceptions

What is the actual behavior? However, after making this change I am running into some exceptions like this sporadically:

Traceback (most recent call last):
  File "/app/lib/python3.11/site-packages/urllib3/connectionpool.py", line 467, in _make_request
    six.raise_from(e, None)
  File "<string>", line 3, in raise_from
  File "/app/lib/python3.11/site-packages/urllib3/connectionpool.py", line 462, in _make_request
    httplib_response = conn.getresponse()
                       ^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/http/client.py", line 1390, in getresponse
    response.begin()
  File "/usr/lib/python3.11/http/client.py", line 325, in begin
    version, status, reason = self._read_status()
                              ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/http/client.py", line 286, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/socket.py", line 706, in readinto
    return self._sock.recv_into(b)
           ^^^^^^^^^^^^^^^^^^^^^^^
TimeoutError: timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/app/lib/python3.11/site-packages/requests/adapters.py", line 486, in send
    resp = conn.urlopen(
           ^^^^^^^^^^^^^
  File "/app/lib/python3.11/site-packages/urllib3/connectionpool.py", line 799, in urlopen
    retries = retries.increment(
              ^^^^^^^^^^^^^^^^^^
  File "/app/lib/python3.11/site-packages/urllib3/util/retry.py", line 550, in increment
    raise six.reraise(type(error), error, _stacktrace)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/lib/python3.11/site-packages/urllib3/packages/six.py", line 770, in reraise
    raise value
  File "/app/lib/python3.11/site-packages/urllib3/connectionpool.py", line 715, in urlopen
    httplib_response = self._make_request(
                       ^^^^^^^^^^^^^^^^^^^
  File "/app/lib/python3.11/site-packages/urllib3/connectionpool.py", line 469, in _make_request
    self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
  File "/app/lib/python3.11/site-packages/urllib3/connectionpool.py", line 358, in _raise_timeout
    raise ReadTimeoutError(
urllib3.exceptions.ReadTimeoutError: HTTPConnectionPool(host='0.0.0.0', port=4318): Read timed out. (read timeout=10)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/app/lib/python3.11/site-packages/opentelemetry/sdk/trace/export/__init__.py", line 367, in _export_batch
    self.span_exporter.export(self.spans_list[:idx])  # type: ignore
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/lib/python3.11/site-packages/opentelemetry/exporter/otlp/proto/http/trace_exporter/__init__.py", line 145, in export
    resp = self._export(serialized_data)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/lib/python3.11/site-packages/opentelemetry/exporter/otlp/proto/http/trace_exporter/__init__.py", line 114, in _export
    return self._session.post(
           ^^^^^^^^^^^^^^^^^^^
  File "/app/lib/python3.11/site-packages/requests/sessions.py", line 637, in post
    return self.request("POST", url, data=data, json=json, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/lib/python3.11/site-packages/requests/sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/lib/python3.11/site-packages/opentelemetry/instrumentation/requests/__init__.py", line 150, in instrumented_send
    return wrapped_send(self, request, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/lib/python3.11/site-packages/requests/sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/lib/python3.11/site-packages/requests/adapters.py", line 532, in send
    raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: HTTPConnectionPool(host='0.0.0.0', port=4318): Read timed out. (read timeout=10)
brad-getpassport commented 6 months ago

+1 as we have seen this sporadically with our system but we are just using auto-injection of traces... originally we thought it was due to the SDK/distro mismatch between code and our system but seems to be happening again...

Any light or insights on how to debug would be helpful

LQss11 commented 5 months ago

+1 as we have seen this sporadically with our system but we are just using auto-injection of traces... originally we thought it was due to the SDK/distro mismatch between code and our system but seems to be happening again...

Any light or insights on how to debug would be helpful

@brad-getpassport got similar issue, maybe this helps.

# From this
otlp_exporter = OTLPSpanExporter(endpoint="http://jaeger:4317")
# To this
otlp_exporter = OTLPSpanExporter(endpoint="http://jaeger:4318/v1/traces")

Fixed it by changing endpoint to http://jaeger:4318/v1/traces instead of http://jaeger:4317

You can check this which helped me