Open Gandem opened 2 days ago
The disassembly stuff was created to avoid hard coding the value. Though, with Python we do have few other hard coded values also. I don't immediately remember if this offset depends on build time configuration knobs, or if its fixed based on version? If it can change, I would avoid fixing.
We do have also the #191. I think it would reduce development cost if we can go more to the direction that non-configuration dependent introspection data could be just extracted from a debug build's debug data.
What happened?
Under certain conditions (two examples detailed below), the python loader fails to resolve the autoTLSKey, which will cause python frames to be missing from the flame graph. Both examples are on amd64 ~and it's unclear whether this also potentially affects arm64 (TBD)~. Update: Only amd64 is impacted.
The two examples are detailed below:
1. Python 3.12+ compiled from source without LTO
Starting with Python 3.12 (commit),
PyGILState_GetThisThreadState
callsPyThread_tss_is_created
first before callingPyThread_tss_get
, see: https://github.com/python/cpython/blob/3.12/Python/pystate.c#L2171-L2174On amd64 default builds of python (without
--enable-optimizations
,--with-lto
, which are not enabled by default https://docs.python.org/3/using/configure.html#performance-options), the calls toPyThread_tss_is_created
andPyThread_tss_get
are not inlined, so the value of autoTLSKey is stored in a register (%rbp
) before being passed to both function calls:This causes the decode disassembler to not find the value in the call instruction since this is not supported in https://github.com/open-telemetry/opentelemetry-ebpf-profiler/blob/bd37e9c/interpreter/python/decode_amd64.c#L11-L20
This is reproducible by downloading and building python from source, and running the profiler next to a python process.
2. Python bundled with
google-cloud-sdk
With
Google Cloud SDK 502.0.0
, onamd64
, which bundles python 3.11,PyGILState_GetThisThreadState
disassembles as follows (gdb -q /lib/google-cloud-sdk/platform/bundledpythonunix/lib/libpython3.11.so
):The decode disassemble will find the
mov $0x250,%edi
instruction and assume that0x250
is the address at whichautoTSSKey
will be found. However, this is modified later on by an ulterioradd
on%rdi
.Eventually,
0x250
will fail the distance check in: https://github.com/open-telemetry/opentelemetry-ebpf-profiler/blob/bd37e9ce23f9904ea7b11389daf50a115ac2389e/interpreter/python/python.go#L701-L704Since it
0x250
is an offset toPyRuntime
rather than the absolute value ofautoTSSKey
.Potential fixes
A couple potential fixes we though about:
a. Disassemble a different function: It's possible to fix (1.) (but not 2.) by alternatively disassembling
PyGILState_Release
, either as a fallback (see patch in https://github.com/DataDog/opentelemetry-ebpf-profiler/commit/630fd97803b41dca87e5a0f6e1e45c31c7e43b90) or directly. Since python 3.0, and still as of python 3.13, this method directly callsPyThread_tss_get
.b. Add more logic in disassembly code: Add logic in
decode_amd64.c
to take into account variables in temporary registers, and the case where%rdi
is populated viamov
thenadd
. We could potentially fix both issues at the cost of additionally complexity.c. Use constant header values: Hardcode the offset of
autoTSSKey
fromPyRuntime
, with a different value depending on the python version.Let me know whether there's any preference / different ideas as to how we should fix this 🙇 !