Open furkanonder opened 1 year ago
CC: @markshannon @gaogaotiantian
I think DTrace was intentionally removed because it was not being used. I forgot where I read that, but Mark will have a better idea of it.
I think a bigger issue is that I'm pretty sure it isn't tested on any CI jobs or buildbots, so stuff can break (like #98894) without us noticing.
If it's not tested, it effectively isn't supported. Not sure whether the right path forward is for some motivated person to fix and maintain it, for us to remove the broken stuff and keep the working stuff (with a big disclaimer in the docs), or to just rip it out entirely.
Anecdotally, it's always been a bit uncomfortable to touch this stuff when modifying the interpreter, since I'm not very sure how it works or how to test it.
@ambv might have an idea of what the right path forward is here.
I think DTrace was intentionally removed because it was not being used. I forgot where I read that, but Mark will have a better idea of it.
Dtrace support was actually being used. DTrace markers, also known as "probes", are not only used by DTtrace. For example, these probes can be used with bpftrace.
There are tools written using these probes. I will even give a talk about these probes at EuroPython 2023 :)
Just chiming in here: we use these probes in anger in production, and their absence is a blocker preventing us from upgrading to 3.11.
If tests are needed to make sure dtrace support is accounted for in the release process, I am happy to contribute time to write tests and set them up.
@ambv @brandtbucher I wrote a test case to check for the presence of probes. Could you review the PR?
The newly added tests fail in Fedora:
0:07:40 load avg: 5.34 Re-running test_dtrace in verbose mode (matching: test_check_probes, test_check_probes, test_check_probes, test_check_probes, test_check_probes)
readelf version: 2.40
test_check_probes (test.test_dtrace.CheckDtraceProbes.test_check_probes) ...
test_check_probes (test.test_dtrace.CheckDtraceProbes.test_check_probes) (probe_name='Name: import__find__load__done') ... FAIL
test_check_probes (test.test_dtrace.CheckDtraceProbes.test_check_probes) (probe_name='Name: import__find__load__start') ... FAIL
test_check_probes (test.test_dtrace.CheckDtraceProbes.test_check_probes) (probe_name='Name: audit') ... FAIL
test_check_probes (test.test_dtrace.CheckDtraceProbes.test_check_probes) (probe_name='Name: gc__start') ... FAIL
test_check_probes (test.test_dtrace.CheckDtraceProbes.test_check_probes) (probe_name='Name: gc__done') ... FAIL
======================================================================
FAIL: test_check_probes (test.test_dtrace.CheckDtraceProbes.test_check_probes) (probe_name='Name: import__find__load__done')
----------------------------------------------------------------------
Traceback (most recent call last):
File "/builddir/build/BUILD/Python-3.12.0rc1/Lib/test/test_dtrace.py", line 241, in test_check_probes
self.assertIn(probe_name, readelf_output)
AssertionError: 'Name: import__find__load__done' not found in '\nDisplaying notes found in: .note.gnu.property\n Owner Data size \tDescription\n GNU 0x00000010\tNT_GNU_PROPERTY_TYPE_0\n Properties: AArch64 feature: BTI, PAC\n\nDisplaying notes found in: .note.gnu.build-id\n Owner Data size \tDescription\n GNU 0x00000014\tNT_GNU_BUILD_ID (unique build ID bitstring)\n Build ID: 3b49ac28c70a95104b93815144881aadefb7bad1\n\nDisplaying notes found in: .note.ABI-tag\n Owner Data size \tDescription\n GNU 0x00000010\tNT_GNU_ABI_TAG (ABI version tag)\n OS: Linux, ABI: 3.7.0\n\nDisplaying notes found in: .note.package\n Owner Data size \tDescription\n FDO 0x00000084\tFDO_PACKAGING_METADATA\n Packaging Metadata: {"type":"rpm","name":"python3.12","version":"3.12.0~rc1-1.fc39","architecture":"aarch64","osCpe":"cpe:/o:fedoraproject:fedora:39"}\n\nDisplaying notes found in: .gnu.build.attributes\n Owner Data size \tDescription\n GA$<version>3a1 0x00000010\tOPEN\n Applies to region from 0x100c0 to 0x100f4\n GA$<version>3a1 0x00000010\tOPEN\n Applies to region from 0x100f4 to 0x10108\n GA$<version>3a1 0x00000010\tOPEN\n Applies to region from 0x10000 to 0x10010\n GA$<version>3a1 0x00000010\tOPEN\n Applies to region from 0x10204 to 0x10210\n GA$<version>3a1 0x00000010\tOPEN\n Applies to region from 0x10110 to 0x101d8\n GA$<version>3a1 0x00000010\tOPEN\n Applies to region from 0x10204 to 0x10204\n GA$<version>3a1 0x00000010\tOPEN\n Applies to region from 0x10204 to 0x10204\n GA$<version>3a1 0x00000010\tOPEN\n Applies to region from 0x10010 to 0x1001c\n GA$<version>3a1 0x00000010\tOPEN\n Applies to region from 0x10210 to 0x1021c\n'
Full log: https://kojipkgs.fedoraproject.org//work/tasks/6927/104486927/build.log
Here's a readelf output printed out:
Displaying notes found in: .note.gnu.property
Owner Data size Description
GNU 0x00000010 NT_GNU_PROPERTY_TYPE_0
Properties: AArch64 feature: BTI, PAC
Displaying notes found in: .note.gnu.build-id
Owner Data size Description
GNU 0x00000014 NT_GNU_BUILD_ID (unique build ID bitstring)
Build ID: 3b49ac28c70a95104b93815144881aadefb7bad1
Displaying notes found in: .note.ABI-tag
Owner Data size Description
GNU 0x00000010 NT_GNU_ABI_TAG (ABI version tag)
OS: Linux, ABI: 3.7.0
Displaying notes found in: .note.package
Owner Data size Description
FDO 0x00000084 FDO_PACKAGING_METADATA
Packaging Metadata: {"type":"rpm","name":"python3.12","version":"3.12.0~rc1-1.fc39","architecture":"aarch64","osCpe":"cpe:/o:fedoraproject:fedora:39"}
Displaying notes found in: .gnu.build.attributes
Owner Data size Description
GA$<version>3a1 0x00000010 OPEN
Applies to region from 0x100c0 to 0x100f4
GA$<version>3a1 0x00000010 OPEN
Applies to region from 0x100f4 to 0x10108
GA$<version>3a1 0x00000010 OPEN
Applies to region from 0x10000 to 0x10010
GA$<version>3a1 0x00000010 OPEN
Applies to region from 0x10204 to 0x10210
GA$<version>3a1 0x00000010 OPEN
Applies to region from 0x10110 to 0x101d8
GA$<version>3a1 0x00000010 OPEN
Applies to region from 0x10204 to 0x10204
GA$<version>3a1 0x00000010 OPEN
Applies to region from 0x10204 to 0x10204
GA$<version>3a1 0x00000010 OPEN
Applies to region from 0x10010 to 0x1001c
GA$<version>3a1 0x00000010 OPEN
Applies to region from 0x10210 to 0x1021c
I'm unfortunately unfamiliar with DTrace. Is this expected? How can I debug this?
Are you building your Python --with-dtrace
? If not, it's a surprise that sysconfig.get_config_var('WITH_DTRACE')
returns 1 in your case...
When I build --with-dtrace
locally, I get 1
but when I don't, that config var returns 0
. In this case the test is just skipped:
setUpClass (test.test_dtrace.CheckDtraceProbes) ... skipped 'CPython must be configured with the --with-dtrace option.'
BTW, this says "dtrace" but those probes are also SystemTap / eBPF.
Yes, --with-dtrace
.
I get the same failure in Codespaces.
OK, sorry for the trouble. I will look into this today.
No trouble at all! Thanks for looking in to it.
The newly added tests fail in Fedora:
0:07:40 load avg: 5.34 Re-running test_dtrace in verbose mode (matching: test_check_probes, test_check_probes, test_check_probes, test_check_probes, test_check_probes) readelf version: 2.40 test_check_probes (test.test_dtrace.CheckDtraceProbes.test_check_probes) ... test_check_probes (test.test_dtrace.CheckDtraceProbes.test_check_probes) (probe_name='Name: import__find__load__done') ... FAIL test_check_probes (test.test_dtrace.CheckDtraceProbes.test_check_probes) (probe_name='Name: import__find__load__start') ... FAIL test_check_probes (test.test_dtrace.CheckDtraceProbes.test_check_probes) (probe_name='Name: audit') ... FAIL test_check_probes (test.test_dtrace.CheckDtraceProbes.test_check_probes) (probe_name='Name: gc__start') ... FAIL test_check_probes (test.test_dtrace.CheckDtraceProbes.test_check_probes) (probe_name='Name: gc__done') ... FAIL ====================================================================== FAIL: test_check_probes (test.test_dtrace.CheckDtraceProbes.test_check_probes) (probe_name='Name: import__find__load__done') ---------------------------------------------------------------------- Traceback (most recent call last): File "/builddir/build/BUILD/Python-3.12.0rc1/Lib/test/test_dtrace.py", line 241, in test_check_probes self.assertIn(probe_name, readelf_output) AssertionError: 'Name: import__find__load__done' not found in '\nDisplaying notes found in: .note.gnu.property\n Owner Data size \tDescription\n GNU 0x00000010\tNT_GNU_PROPERTY_TYPE_0\n Properties: AArch64 feature: BTI, PAC\n\nDisplaying notes found in: .note.gnu.build-id\n Owner Data size \tDescription\n GNU 0x00000014\tNT_GNU_BUILD_ID (unique build ID bitstring)\n Build ID: 3b49ac28c70a95104b93815144881aadefb7bad1\n\nDisplaying notes found in: .note.ABI-tag\n Owner Data size \tDescription\n GNU 0x00000010\tNT_GNU_ABI_TAG (ABI version tag)\n OS: Linux, ABI: 3.7.0\n\nDisplaying notes found in: .note.package\n Owner Data size \tDescription\n FDO 0x00000084\tFDO_PACKAGING_METADATA\n Packaging Metadata: {"type":"rpm","name":"python3.12","version":"3.12.0~rc1-1.fc39","architecture":"aarch64","osCpe":"cpe:/o:fedoraproject:fedora:39"}\n\nDisplaying notes found in: .gnu.build.attributes\n Owner Data size \tDescription\n GA$<version>3a1 0x00000010\tOPEN\n Applies to region from 0x100c0 to 0x100f4\n GA$<version>3a1 0x00000010\tOPEN\n Applies to region from 0x100f4 to 0x10108\n GA$<version>3a1 0x00000010\tOPEN\n Applies to region from 0x10000 to 0x10010\n GA$<version>3a1 0x00000010\tOPEN\n Applies to region from 0x10204 to 0x10210\n GA$<version>3a1 0x00000010\tOPEN\n Applies to region from 0x10110 to 0x101d8\n GA$<version>3a1 0x00000010\tOPEN\n Applies to region from 0x10204 to 0x10204\n GA$<version>3a1 0x00000010\tOPEN\n Applies to region from 0x10204 to 0x10204\n GA$<version>3a1 0x00000010\tOPEN\n Applies to region from 0x10010 to 0x1001c\n GA$<version>3a1 0x00000010\tOPEN\n Applies to region from 0x10210 to 0x1021c\n'
Full log: https://kojipkgs.fedoraproject.org//work/tasks/6927/104486927/build.log
Here's a readelf output printed out:
Displaying notes found in: .note.gnu.property Owner Data size Description GNU 0x00000010 NT_GNU_PROPERTY_TYPE_0 Properties: AArch64 feature: BTI, PAC Displaying notes found in: .note.gnu.build-id Owner Data size Description GNU 0x00000014 NT_GNU_BUILD_ID (unique build ID bitstring) Build ID: 3b49ac28c70a95104b93815144881aadefb7bad1 Displaying notes found in: .note.ABI-tag Owner Data size Description GNU 0x00000010 NT_GNU_ABI_TAG (ABI version tag) OS: Linux, ABI: 3.7.0 Displaying notes found in: .note.package Owner Data size Description FDO 0x00000084 FDO_PACKAGING_METADATA Packaging Metadata: {"type":"rpm","name":"python3.12","version":"3.12.0~rc1-1.fc39","architecture":"aarch64","osCpe":"cpe:/o:fedoraproject:fedora:39"} Displaying notes found in: .gnu.build.attributes Owner Data size Description GA$<version>3a1 0x00000010 OPEN Applies to region from 0x100c0 to 0x100f4 GA$<version>3a1 0x00000010 OPEN Applies to region from 0x100f4 to 0x10108 GA$<version>3a1 0x00000010 OPEN Applies to region from 0x10000 to 0x10010 GA$<version>3a1 0x00000010 OPEN Applies to region from 0x10204 to 0x10210 GA$<version>3a1 0x00000010 OPEN Applies to region from 0x10110 to 0x101d8 GA$<version>3a1 0x00000010 OPEN Applies to region from 0x10204 to 0x10204 GA$<version>3a1 0x00000010 OPEN Applies to region from 0x10204 to 0x10204 GA$<version>3a1 0x00000010 OPEN Applies to region from 0x10010 to 0x1001c GA$<version>3a1 0x00000010 OPEN Applies to region from 0x10210 to 0x1021c
I'm unfortunately unfamiliar with DTrace. Is this expected? How can I debug this?
Did you run the readelf tool with the -n parameter? I don't see the probes in your output.
Now I have build cpython and I can see the probes.
[ted@rasp4 cpython]$ readelf -n ./python
Displaying notes found in: .note.gnu.build-id
Owner Data size Description
GNU 0x00000014 NT_GNU_BUILD_ID (unique build ID bitstring)
Build ID: ac4de5af42642e3a69705be808d9c8a73aeee03f
Displaying notes found in: .note.ABI-tag
Owner Data size Description
GNU 0x00000010 NT_GNU_ABI_TAG (ABI version tag)
OS: Linux, ABI: 3.7.0
Displaying notes found in: .note.stapsdt
Owner Data size Description
stapsdt 0x00000043 NT_STAPSDT (SystemTap probe descriptors)
Provider: python
Name: import__find__load__done
Location: 0x00000000002bad4c, Base: 0x0000000000478105, Semaphore: 0x00000000006c3c7c
Arguments: 8@x0 -4@x1
stapsdt 0x0000003e NT_STAPSDT (SystemTap probe descriptors)
Provider: python
Name: import__find__load__start
Location: 0x00000000002bb084, Base: 0x0000000000478105, Semaphore: 0x00000000006c3c7a
Arguments: 8@x0
stapsdt 0x00000031 NT_STAPSDT (SystemTap probe descriptors)
Provider: python
Name: audit
Location: 0x00000000002ef1e0, Base: 0x0000000000478105, Semaphore: 0x00000000006c3c7e
Arguments: 8@x21 8@x20
stapsdt 0x00000036 NT_STAPSDT (SystemTap probe descriptors)
Provider: python
Name: gc__start
Location: 0x000000000030a3c8, Base: 0x0000000000478105, Semaphore: 0x00000000006c3c76
Arguments: -4@[sp, 132]
stapsdt 0x0000002f NT_STAPSDT (SystemTap probe descriptors)
Provider: python
Name: gc__done
Location: 0x000000000030a8e4, Base: 0x0000000000478105, Semaphore: 0x00000000006c3c78
Arguments: -8@x19
[ted@rasp4 cpython]$
I didn't see a problem in the tests.
[ted@rasp4 cpython]$ sudo ./python -m test test_dtrace
0:00:00 load avg: 0.46 Run tests sequentially
0:00:00 load avg: 0.46 [1/1] test_dtrace
== Tests result: SUCCESS ==
1 test OK.
Total duration: 4.1 sec
Tests result: SUCCESS
[ted@rasp4 cpython]$
[ted@rasp4 cpython]$ ./python -m test test_dtrace -v
== CPython 3.13.0a0 (heads/main:0e6e32fb84, Aug 8 2023, 19:47:11) [GCC 12.1.0]
== Linux-5.15.89-1-MANJARO-ARM-RPI-aarch64-with-glibc2.35 little-endian
== Python build: release dtrace
== cwd: /home/ted/cpython/build/test_python_2604899æ
== CPU count: 4
== encodings: locale=UTF-8, FS=utf-8
0:00:00 load avg: 0.52 Run tests sequentially
0:00:00 load avg: 0.52 [1/1] test_dtrace
readelf version: 2.38
test_check_probes (test.test_dtrace.CheckDtraceProbes.test_check_probes) ... ok
test_missing_probes (test.test_dtrace.CheckDtraceProbes.test_missing_probes) ... expected failure
skipped 'dtrace(1) failed: /usr/bin/dtrace invalid option -q\nUsage /usr/bin/dtrace [--help] [-h | -G] [-C [-I<Path>]] -s File.d [-o <File>]'
setUpClass (test.test_dtrace.DTraceOptimizedTests) ... skipped 'dtrace(1) failed: /usr/bin/dtrace invalid option -q\nUsage /usr/bin/dtrace [--help] [-h | -G] [-C [-I<Path>]] -s File.d [-o <File>]'
setUpClass (test.test_dtrace.SystemTapNormalTests) ... skipped 'stap(1) failed: You are trying to run systemtap as a normal user.\nYou should either be root, or be part of the group "stapusr" and possibly one of the groups "stapsys" or "stapdev". [man stap]\nTry \'--help\' for more information. [man stap]\nA message like [man foo] means for more info, run % man foo'
setUpClass (test.test_dtrace.SystemTapOptimizedTests) ... skipped 'stap(1) failed: You are trying to run systemtap as a normal user.\nYou should either be root, or be part of the group "stapusr" and possibly one of the groups "stapsys" or "stapdev". [man stap]\nTry \'--help\' for more information. [man stap]\nA message like [man foo] means for more info, run % man foo'
----------------------------------------------------------------------
Ran 2 tests in 1.927s
OK (skipped=4, expected failures=1)
== Tests result: SUCCESS ==
1 test OK.
Total duration: 2.1 sec
Tests result: SUCCESS
[ted@rasp4 cpython]$
Your readelf versions differ. Maybe there's a difference in output between 2.38 and 2.40?
Oh, got it! We build with --enable-shared
, so thetraces and notes are in the shared library. (And when I went to reproduce in Codespaces, I copied my favourite ./config
invocation without thinking.)
$ readelf -n libpython3.13d.so
Displaying notes found in: .note.gnu.build-id
Owner Data size Description
GNU 0x00000014 NT_GNU_BUILD_ID (unique build ID bitstring)
Build ID: 4765c16f47a5e0e0a0fadcd18056ec129b71a557
Displaying notes found in: .note.stapsdt
Owner Data size Description
stapsdt 0x00000046 NT_STAPSDT (SystemTap probe descriptors)
Provider: python
Name: import__find__load__start
Location: 0x00000000003346c4, Base: 0x00000000005611f0, Semaphore: 0x000000000078d0ba
Arguments: 8@-144(%rbp)
[...]
Getting to that library might be complicated -- I can't think of something easier than ldd
on sys.executable
and extracting the path corresponding to sysconfig.get_config_var('LDLIBRARY')
.
Note: This affects Fedora also on 3.11.5. We will skip the test_check_probes
in the meantime.
dtrace is supposed to have zero (or very close to it) overhead when not in use.
The CPython implementation failed in this. If it had, perhaps PEP 669 would not have been needed.
I think it is best to push users towards using sys.monitoring
now that 3.12 has been released.
I realize that's not the ideal solution for existing users of dtrace
, but sys.monitoring
doesn't need a special build and is tested on CI.
It also looks like dtrace function entry robes are broken in 3.11, as they are only invoked if sys.settrace
tracing is also turned on.
The probe point is present, but the normal code path skips it.
With dtrace support effectively removed in 3.12, a few thoughts.
My initial read of sys.monitoring
is that while it is great for bespoke, python-specific monitoring and tooling, it doesn't quite work in a world that is increasingly moving towards one-size fits all distributed tracing via ebpf (i.e. dtrace).
Is there room in Python for both visions of tracing? If not, why? If yes, what kind of resourcing would be needed to make sure that dtrace isn't removed again from the build?
dtrace is supposed to have zero (or very close to it) overhead when not in use. The CPython implementation failed in this. If it had, perhaps PEP 669 would not have been needed. I think it is best to push users towards using
sys.monitoring
now that 3.12 has been released.I realize that's not the ideal solution for existing users of
dtrace
, butsys.monitoring
doesn't need a special build and is tested on CI.
This is a little bit weird, the DTrace and the PEP 669 are different features.
the DTrace API is for people who want to trace the Python process without restarting the process and injecting some code for the PEP 669
We can solve the process tracing problem by using #96123 in some circumstances. But there are still some issues.
If we decide to remove DTrace Support officially, we may need to update the documentation https://docs.python.org/3/howto/instrumentation.html
When building python 3.11.7 from source, I ran into this error:
python configure: error: dtrace command not found on $PATH
Fixed using:
sudo apt-get install systemtap-sdt-dev
By installing dtrace (well almost) toolkit - the binary is not really used for tracing but only to list probes if I am correct.
I took a look at this yesterday in 3.12 as part of my current effort to package 3.12 for OpenIndiana (illumos) and I got the function entry/exit probes restored and working well enough to pass the test_dtrace.py test.
I'll share when I get a chance - I'm doing this with original dtrace on illumos (the open-source continuation of opensolaris) but it would likely also work for those who want to use these probes via systemtap.
I'll also look at the line probes when I get a chance.
https://github.com/Bill-Sommerfeld/cpython/tree/dtrace-3.12 has patches that restore the function-entry
and function-return
probes for me on illumos. The tests now pass for me, except for test.test_dtrace.DTraceNormalTests.test_line
and
test.test_dtrace.DTraceOptimizedTests.test_line
which depend on the line
probe which I haven't restored yet.
I'd appreciate it if others could give these a whirl on other platforms.
Bug report
On Linux, you can list available DTrace probes by running /bcc/tools/tplist.py tool;
The
function__entry
,function__return
andline
probes appear to be missing after the GH-103082: Implementation of PEP 669: Low Impact Monitoring for CPython update.Your environment
Linked PRs