open-telemetry / opentelemetry-python

OpenTelemetry Python API and SDK
https://opentelemetry.io
Apache License 2.0
1.76k stars 614 forks source link

Python 3.9 segfault in minimal tracing snippet #3801

Closed NullHypothesis closed 6 months ago

NullHypothesis commented 6 months ago

The following code reproducibly results in a segmentation fault:

#!/usr/bin/env python3.9

# pip install opentelemetry-sdk
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import (
    BatchSpanProcessor,
    ConsoleSpanExporter,
)

fd = open("foo.txt", "w")
tracer_provider = TracerProvider()
processor = BatchSpanProcessor(ConsoleSpanExporter(out=fd))
tracer_provider.add_span_processor(processor)

If you don't have a copy of Python 3.9 handy, you can use this Dockerfile:

FROM python:3.9.18-slim
RUN pip install opentelemetry-sdk
COPY file.py .
CMD ./file.py

Describe your environment This happens on both macOS (Sonoma 14.3.1) and Linux (Ubuntu 23.10). As far as I can tell, Python <=3.9 is affected but not Python >=3.10.

Steps to reproduce The segfault occurs under the following conditions:

What is the expected behavior? No segfault.

What is the actual behavior? Segfault.

Additional context I realize that Python 3.9 is close to its end-of-life but I figured that there's merit in reporting this issue regardless.

methane commented 6 months ago

This is a bug in Python. Python tried to show "unclosed file" ResourceWarning, but globals dict is already gone.

I'm not sure this would be fixed because 3.9 is in security fix only mode.

Thread 1 "python" received signal SIGSEGV, Segmentation fault.
_PyDict_GetItemIdWithError (dp=0x0, key=<optimized out>) at Objects/dictobject.c:1491
1491    Objects/dictobject.c: No such file or directory.
(gdb) bt
#0  _PyDict_GetItemIdWithError (dp=0x0, key=<optimized out>) at Objects/dictobject.c:1491
#1  setup_context (stack_level=<optimized out>, filename=<optimized out>, lineno=<optimized out>, module=<optimized out>, registry=<optimized out>)
    at Python/_warnings.c:876
#2  do_warn (message=0x7ffff65378b0, category=0x7ffff7e83010 <_PyExc_ResourceWarning>, stack_level=<optimized out>, source=0x7ffff684dba0)
    at Python/_warnings.c:953
#3  0x00007ffff6bb67da in warn_unicode (category=0x7ffff7e83010 <_PyExc_ResourceWarning>, message=0x7ffff65378b0, stack_level=1,
    source=0x7ffff684dba0) at Python/_warnings.c:1108
#4  _PyErr_WarnFormatV (source=0x7ffff684dba0, category=<optimized out>, stack_level=1, format=<optimized out>, vargs=<optimized out>)
    at Python/_warnings.c:1128
#5  0x00007ffff6bb678b in PyErr_ResourceWarning (source=0x7ffff68a99e0, stack_level=0, format=0x7ffff68a99e0 "\001") at Python/_warnings.c:1179
#6  0x00007ffff6eb1e4e in fileio_dealloc_warn (self=0x7ffff659a940, source=0x7ffff7ece330 <PyId_mode.16579>) at ./Modules/_io/fileio.c:96
#7  0x00007ffff6ca6841 in method_vectorcall_O (func=0x7ffff686b6d0, args=0x7fffffffd9d0, nargsf=<optimized out>, kwnames=<optimized out>)
    at Objects/descrobject.c:464
#8  0x00007ffff6b8b576 in _PyObject_VectorcallTstate (tstate=0x55555555cf60, callable=0x7ffff686b6d0, args=0x7fffffffd9d0, nargsf=2, kwnames=0x0)
    at ./Include/cpython/abstract.h:118
methane commented 6 months ago

I investigated that is this issue fixed or just hidden by some random reason. I found this commit fixed it already.

https://github.com/python/cpython/pull/21605

methane commented 6 months ago

FYI, minimum reproducible code without otel:

import os, time

f = open("foo.txt", "w")

class C:
    def __init__(self):
        self.f = f
        os.register_at_fork(after_in_child=self.atfork)

    def atfork(self):
        print("atfork")
c=C()
del c, f
NullHypothesis commented 6 months ago

Great work, thanks @methane! I also reported this in Python's issue tracker (https://github.com/python/cpython/issues/117090) but closed the issue because I didn't have a reproducible snippet without third-party code.

As you said, this is unlikely to get fixed by Python and cannot be fixed by OpenTelemetry, so we might as well close this issue.

methane commented 6 months ago

As you said, this is unlikely to get fixed by Python and cannot be fixed by OpenTelemetry, so we might as well close this issue.

Root cause is unclosed file. You can fix it by subclassing ConsoleSpanExporter and implement shutdown:

class FileSpanExporter(ConsoleSpanExporter):
    def shutdown(self):
        self.out.close()
NullHypothesis commented 6 months ago

Thanks again for your careful investigation of this issue, @methane.