Open b-ripper opened 3 years ago
@davemarchevsky could you help take a look?
Hi @b-ripper, some questions:
trace.py
/simpleperf
against proprietary? If so, is there some repro case you could provide?libGLESv2_adreno.so
might be a openGL-related library? Did the .so
come precompiled? Often computationally-intensive libs will be compiled without frame pointer, which breaks the stack walking used by BPF. Could you check if this is the case?@davemarchevsky Hi, thanks for your response! I traced app com.ss.android.ugc.aweme, using trace.py and simpleperf. The libGLESv2_adreno.so is from qualcomm, an openGL-related library, prebuilt, not open source.
Here I mean, b'[unknown] [libGLESv2_adreno.so]' b'[unknown] [libGLESv2_adreno.so]' b'[unknown] [libGLESv2_adreno.so]' b'[unknown] [libGLESv2_adreno.so]' b'[unknown] [libGLESv2_adreno.so]' from trace.py, is too short, why only 5 lines displayed? could it output more lines?
Another question: Could the "unknown" displayed by address like simpleperf?
Could the "unknown" displayed by address like simpleperf?
This won't exactly match simpleperf
's output, but trace.py
's -a
flag will print virtual addr for each frame. A separate option to replace [unknown]
directly with virtual addr could be useful.
Back to the initial question: Continuing to operate under the assumption that libGLESv2_adreno.so
was compiled without frame pointers, the shorter 'broken' stack is expected since the kernel's stack unwinder expects to be able to use the frame pointer to find the next frame, so when it fails to do so in this case it can't find the rest of the frames and just returns what it found.
I spent some time looking at simpleperf
's code, looks like it's essentially doing the equivalent of perf
's call-graph=dwarf
setting. Meaning it asks perf_event_open
to grab the entire user stack, instead of walking the stack in kernel space and just returning the frames. Then DWARF-assisted stack walking is done in user space to work around the lack of frame pointers.
We could probably support something similar for the bcc tools, but I'd like to avoid reinventing the wheel by pulling in an existing implementation, and it might require some changes to BPF helpers. Unfortunately this means there's no easy short-term fix for trace.py
.
@davemarchevsky thanks for your response!
Yes, the stacktrace from simpleperf is from --call-graph=dwarf, if set --call-graph=fp, it will be the same as the trace.py, broken shorter stack. If one library is complied without frame pointers, the fp way will get broken shorter stack?
trace.py's -a flag will print virtual addr for each frame
I tried add -a flag, but not worked.
We could probably support something similar for the bcc tools
Will the dwarf way be supported in long term? And I found eralier issues #1234 , #1803 , #1953 , #2887, which talk about dwarf stacktrace. And also https://github.com/iovisor/bpftrace/issues/1744.
I tried add -a flag, but not worked.
What's the rest of the trace.py
command you're using? I just tried trace.py -U -a do_sys_open
and got output with virtual addrs:
1947284 1947289 Watcher do_sys_open
7f4373cbe692 b'__open+0xa2 [libpthread-2.30.so]'
1bc85ad b'osquery::PlatformFile::PlatformFile(boost::filesystem::path const&, int, int)+0x17d [osqueryd]'
1bcd084 b'osquery::readFile(boost::filesystem::path const&, unsigned long, unsigned long, bool, bool, std::function<void (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, unsigned
long)>, bool)+0x64 [osqueryd]'
If one library is complied without frame pointers, the fp way will get broken shorter stack?
Yep, specifically we'll never be able to get to root of the stack, or any functions calling into the no-fp library.
Will the dwarf way be supported in long term? And I found eralier issues #1234 , #1803 , #1953 , #2887, which talk about dwarf stacktrace. And also iovisor/bpftrace#1744.
Thank you for linking these! It's definitely worth serious consideration since enough folks across BPF-related projects have requested it. I talked to some other BPF ecosystem folks (@danobi, @yonghong-song, @anakryiko) and there's definitely interest in tackling this in a generalizable way.
One reasonably quick way to solve this for your specific case would be to do something similar to what perf
/simpleperf
do: grab a big chunk of the stack with a bpf_probe_read_user
instead of relying on bpf_get_stack
helpers, send it to userspace, and use libunwind
to unwind. I think simpleperf
's OfflineUnwinder
is doing this.
We had some initial discussion and here are my current thoughts: https://dxuuu.xyz/stack-symbolize.html .
May change things later if I have other ideas.
Thanks! cc @davemarchevsky Please make sure the dwarf based stack unwinding also works for bcc. Is it possible we peel out bcc symbolization codes as a separate mini library so it can be reused?
Please make sure the dwarf based stack unwinding also works for bcc
Yeah, will definitely make it work for bcc.
Is it possible we peel out bcc symbolization codes as a separate mini library so it can be reused?
We were thinking of writing the libraries in rust (b/c there's some great rust libraries for DWARF/ELF parsing) and exposing a C interface.
Okay. Thanks.
Is that possible to integrate it to libbpf so that the pure c language also can get the correct user space stack? Thanks.
Is that possible to integrate it to libbpf
It's unlikely we directly add it into libbpf. Better to have a separate library b/c it's not libbpf is not really concerned w/ symbolizing. And not nice to have libbpf users pay for functionality they may not use.
so that the pure c language also can get the correct user space stack? Thanks.
There will be a C interface such that C applications can link against the library.
Hi,
I captured the stack trace of the same action. But the user stack trace from trace.py is very short.
Stack traces captured with trace.py:
Stack traces captured with simpleperf: