Open rnav opened 8 years ago
Not sure I understand the full issue or suggestion here. As it is now, we're just using standard gcc logic when compiling.
If you run make VERBOSE=1
, can you capture the line where libbcc.so is linked, and make some modifications that cause it to work for you? If you could for instance identify a problematic gcc line, we could help in working that into the build definitions.
Oh, I probably should have explained better. The problem actually shows up at runtime and not while building bcc itself. In this case, it is with test_uprobes.py:
# /root/bcc/build/tests/wrapper.sh "py_uprobes" "sudo" "/root/bcc/tests/python/test_uprobes.py"
Python 2.7.5
.Arena 0:
system bytes = 12648448
in use bytes = 3013088
Total (incl. mmap):
system bytes = 16449536
in use bytes = 6814176
max mmap regions = 12
max mmap bytes = 4456448
F
======================================================================
FAIL: test_simple_library (__main__.TestUprobes)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/root/bcc/tests/python/test_uprobes.py", line 34, in test_simple_library
self.assertEqual(b["stats"][ctypes.c_int(0)].value, 2)
AssertionError: 0L != 2
----------------------------------------------------------------------
Ran 2 tests in 0.510s
FAILED (failures=1)
Failed
In test_uprobes.py, we use the below line to place a probe at malloc_stats() in libc:
b.attach_uprobe(name="c", sym="malloc_stats", fn_name="count")
This triggers a search for libc, which on powerpc ends up picking the wrong library to place the probe (/lib64/power8/libc.so.6 rather than /lib64/libc.so.6). As such, the probe never fires.
The reason we pick the wrong library is because we are not considering the hwcap associated with the library.
Yes, thanks for explaining, that certainly makes more sense! Actually the problem seems obvious in retrospect but its been a busy morning so far :)
Sure. It looks like @vmg wrote much of this code. @vmg do you have ideas on how best to address this?
Just ran into this issue while building / testing on Debian 8 amd64.
mikep@mv-tricolor:~/bcc/obj-x86_64-linux-gnu$ sudo /usr/bin/ctest --force-new-ctest-process -j1 -V
...
20: Test command: /home/mikep/bcc/obj-x86_64-linux-gnu/tests/wrapper.sh "py_uprobes" "sudo" "/home/mikep/bcc/tests/python/test_uprobes.py"
20: Test timeout computed to be: 9.99988e+06
20: Python 2.7.9
20: .Arena 0:
20: system bytes = 13799424
20: in use bytes = 2969696
20: Total (incl. mmap):
20: system bytes = 14589952
20: in use bytes = 3760224
20: max mmap regions = 4
20: max mmap bytes = 1589248
20: F
20: ======================================================================
20: FAIL: test_simple_library (__main__.TestUprobes)
20: ----------------------------------------------------------------------
20: Traceback (most recent call last):
20: File "/home/mikep/bcc/tests/python/test_uprobes.py", line 34, in test_simple_library
20: self.assertEqual(b["stats"][ctypes.c_int(0)].value, 2)
20: AssertionError: 0L != 2
20:
20: ----------------------------------------------------------------------
20: Ran 2 tests in 0.217s
20:
20: FAILED (failures=1)
20: Failed
20/28 Test #20: py_uprobes .......................***Failed 0.29 sec
Looks like malloc_stats()
is missing from this distro and version.
mikep@mv-tricolor:~$ python
Python 2.7.9 (default, Jun 29 2016, 13:08:31)
[GCC 4.9.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from ctypes.util import find_library
>>> find_library('malloc_stats')
>>> find_library('c')
'libc.so.6'
@rnav, would you mind doing the same check with python and find_library()
on the host where you discovered this issue?
Ugh. That's not it either...
mikep@mv-tricolor:~$ nm -D --defined-only /lib/x86_64-linux-gnu/libc.so.6
...
000000000007ddb0 W malloc_stats
...
It's definitely defined, and after reading the man page for malloc_stats()
, its pretty clear that it did run and this was its output:
20: Python 2.7.9
20: .Arena 0:
20: system bytes = 13799424
20: in use bytes = 2969696
20: Total (incl. mmap):
20: system bytes = 14589952
20: in use bytes = 3760224
20: max mmap regions = 4
20: max mmap bytes = 1589248
So something about that bpf probe didn't fire correctly...
@mprzybylski you can put in a sleep()
in tests/python/test_uprobes.py
and check /sys/kernel/debug/tracing/uprobe_events
to see which library is being picked by bcc.
I thought of using the aux vector to figure out the hardware capabilities before picking up the right library, but with 32-bit and 64-bit libraries, that may not be enough. Perhaps we should just put a probe on all matching libraries?
We discussed the same bug in #853. Sorry I didn't drop a note here before.
I also started working on the second strategy discussed in #853 (using the running architecture of the bcc process to help select the appropriate library), but I don't have much time, so it might take longer. If anyone else wants to take care of that one, please go ahead :smiley:
@rnav, sure enough, this is general problem for multiarch platforms. Thanks for pointing me in the right direction.
root@mv-tricolor:/sys/kernel/debug/tracing# cat uprobe_events
p:uprobes/p__libx32_libc_so_6_0x76f30 /libx32/libc.so.6:0x0000000000076f30
r:uprobes/r__libx32_libc_so_6_0x76f30 /libx32/libc.so.6:0x0000000000076f30
@pchaigno, Nice work on #875.
I just patched tests/python/test_uprobes.py
to take advantage of it and got those tests to pass. I'll submit a pull request with that, and a few other things soon...
Hi @pchaigno which patches should be applied to make the test pass ? Right now on a Debian Jessie the build hangs on the py_uprobes test.
@finelli You shouldn't need any patch for the tests to pass. What makes you think it's related to this issue?
<div *ngFor="let product of products">
bcc is not considering the encoded hwcap when choosing libraries. As such, uprobes on a shared library does not work on powerpc, as seen with the uprobes test:
libc libraries in cache:
bcc always picks the first library here, which won't work on non-power8 machines.
We need to either implement stricter checks (look at hwcap and perhaps the platform) or consider probing on all libraries with the same name.