Closed YangKeao closed 2 years ago
In what way is the vdso unwind information wrong on Ubuntu 18.04? Do you have an example of an address which unwinds incorrectly?
@mornyx Do you have some minimal reproduce example for https://github.com/tikv/pprof-rs/pull/111#issuecomment-1105656716 ?
@mornyx Do you have some minimal reproduce example for #111 (comment) ?
It's hard to implement a demo which can be reproduced stably, but we can try to use gimli-rs
to parse the CFI
of vdso
, gimli-rs
does tell us that the data in .eh_frame_hdr
or .eh_frame
has some errors.
Use the example below:
use fallible_iterator::FallibleIterator;
use gimli::{BaseAddresses, CieOrFde, EhFrame, EhFrameHdr, NativeEndian, Pointer, UnwindSection};
use std::slice;
const PF_X: u32 = 1;
#[derive(Default, Debug)]
struct SectionInfo {
base: u64,
text: u64,
text_len: u64,
eh_frame_hdr: u64,
eh_frame_hdr_len: u64,
max_addr: u64,
}
fn main() {
let mut info = SectionInfo::default();
unsafe {
libc::dl_iterate_phdr(Some(callback), &mut info as *mut _ as *mut libc::c_void);
}
if info.eh_frame_hdr == 0 {
panic!("not found");
}
let address = BaseAddresses::default()
.set_text(info.text)
.set_eh_frame_hdr(info.eh_frame_hdr);
let eh_frame_hdr_data = unsafe { slice::from_raw_parts(info.eh_frame_hdr as _, info.eh_frame_hdr_len as _) };
let eh_frame_hdr = EhFrameHdr::new(eh_frame_hdr_data, NativeEndian)
.parse(&address, 8)
.unwrap();
let eh_frame_ptr = match eh_frame_hdr.eh_frame_ptr() {
Pointer::Direct(v) => v,
Pointer::Indirect(v) => unsafe { *(v as *const u64) },
};
let address = address.set_eh_frame(eh_frame_ptr);
let eh_frame_data = unsafe { slice::from_raw_parts(eh_frame_ptr as _, info.max_addr as _) };
let eh_frame = EhFrame::new(eh_frame_data, NativeEndian);
for entry in eh_frame.entries(&address).iterator() {
let entry = entry.unwrap();
match entry {
CieOrFde::Cie(_) => {
println!("cie");
}
CieOrFde::Fde(_) => {
println!("fde");
}
}
}
}
extern "C" fn callback(info: *mut libc::dl_phdr_info, _size: libc::size_t, data: *mut libc::c_void) -> libc::c_int {
unsafe {
let mut data = data as *mut SectionInfo;
match std::ffi::CStr::from_ptr((*info).dlpi_name).to_str() {
Ok(name) => {
if !name.contains("vdso") {
return 0;
}
}
Err(_) => return 0,
}
(*data).base = (*info).dlpi_addr;
let hdrs = slice::from_raw_parts((*info).dlpi_phdr, (*info).dlpi_phnum as usize);
for hdr in hdrs {
match hdr.p_type {
libc::PT_LOAD => {
if hdr.p_flags & PF_X != 0 {
(*data).text = (*info).dlpi_addr + hdr.p_vaddr;
(*data).text_len = hdr.p_memsz;
}
let max_addr = (*info).dlpi_addr + hdr.p_vaddr + hdr.p_filesz;
if (*data).max_addr < max_addr {
(*data).max_addr = max_addr;
}
}
libc::PT_GNU_EH_FRAME => {
(*data).eh_frame_hdr = (*info).dlpi_addr + hdr.p_vaddr;
(*data).eh_frame_hdr_len = hdr.p_memsz;
}
_ => {}
}
}
0
}
}
We will get:
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: UnexpectedEof(ReaderOffsetId(281473204903936))', src/main.rs:32:10
When we replace if !name.contains("vdso")
with if !name.contains("libc")
or any other library's name, it will successfully output cie
s and fde
s...
Perfect, thank you!
For future reference, I've archived the vdso64.so from Ubuntu 18.04.6 LTS here: https://storage.googleapis.com/profiler-get-symbols-fixtures/ubuntu-18.04-lts-vdso/vdso64.so
(I got it by downloading and unpacking the iso, and then finding it at /lib/modules/5.4.0-84-generic/vdso/vdso64.so
.)
Hmm, this file doesn't reproduce the problem. I will try the code you gave me on an actual Ubuntu 18.04 installation. Here is the code I tried.
Ah, I think my mistake was using Ubuntu 18.04.6. On that system, the code you gave me does not panic. Can you remember the exact Ubuntu version you were seeing this on? The /lib/modules/4.15.0-20-generic/vdso/vdso64.so
from this Ubuntu 18.04 iso seems fine too.
Can you remember the exact Ubuntu version you were seeing this on?
I tested it on Ubuntu 20.04 on aws
, 18.04/20.04 on docker for mac (arm64)
and they all panic. I was wondering if the problem could be related to dl_iterate_phdr
..
CentOS7 on docker for mac (arm64)
will also panic, but there is one detail: when calling dl_iterate_phdr
on CentOS, the dlpi_name
of vdso
is an empty string, which needs to be distinguished from the executable itself.
I met the crash in Ubuntu 18.04 (on AWS, x86_64) by continuously pprof the TiKV (without the vdso skip) for some hours (ref). However calling dwarfdump for the vdso does not report erorrs. Skipping the vdso indeed resolve the crash.
I was able to reproduce the panic in line 32 with the Ubuntu 20.04 aarch64 vdso (archived here)! Thanks for the additional information.
The reason it panics is that this vdso does not have an eh_frame_hdr
section. It only has an eh_frame
section. But it still has a PT_GNU_EH_FRAME ELF program header, with all values set to zero. So we set eh_frame_hdr_data
to an empty slice.
I am not sure why libunwind segfaulted on the x86_64 vdso. So far I have not seen evidence of bad dwarf in it.
Signed-off-by: YangKeao yangkeao@chunibyo.icu