parca-dev / parca-agent

eBPF based always-on profiler auto-discovering targets in Kubernetes and systemd, zero code changes or restarts needed!
https://parca.dev/
Apache License 2.0
558 stars 68 forks source link

Support more executable mappings #1458

Open aristotelhs opened 1 year ago

aristotelhs commented 1 year ago

Currently the default support for executable mappings is 250. Unfortunately since we are running a much larger executable we are reaching the number of mappings to ~420. From the discussion the biggest drawbacks are increased memory consumption, complexity added to the kernel bpf verifier, and time required to do the in-kernel unwinding.

aristotelhs commented 1 year ago

maps.txt Updated an example map file.

vchuravy commented 1 year ago

What allocator are you using? Are you using jemalloc?

vchuravy commented 1 year ago

Okay yes this is jemalloc based. I see 418 xp mapping.

I am slightly confused by the pattern:

7f1e4dab0000-7f1e4dac0000 rwxp 00000000 00:00 0 
7f1e4dac0000-7f1e4dac1000 ---p 00000000 00:00 0 
7f1e4dac1000-7f1e4e2c1000 rwxp 00000000 00:00 0 
7f1e4e2c1000-7f1e4e2c2000 ---p 00000000 00:00 0 
7f1e4e2c2000-7f1e4eac2000 rwxp 00000000 00:00 0 
7f1e4eac2000-7f1e4eac3000 ---p 00000000 00:00 0 
7f1e4eac3000-7f1e4f2c3000 rwxp 00000000 00:00 0 

That looks like a guard page in between JIT segments. Actually could you try and capture both the /proc/PID/maps and the jitdump file? (Remember to start with ENABLE_JITPROFILING=1) this should allow us to match pages with actual executable data.

aristotelhs commented 1 year ago

This was my thoughts that this was causes by a lot of jitted functions. (And the guard page makes sense). [yes we use jemalloc as you mentioned]. I was thinking of maybe fiddling with jemalloc options to remove the guard pages and make this more continuous, but I believe there are security implications of doing so. Unfortunately this process was long running and did not get a chance to be started with jit profiling enabled (we have made it default a week ago). I will try to find another process and get the data out of this and attach them here.

vchuravy commented 1 year ago

Yeah Julia does manage it's own memory through mmap for JIT, so we might bypass jemalloc (I see the same guard page without jemalloc)

aristotelhs commented 1 year ago

I attach another process, which has less mappings (253, still larger than 250) and also a the jitdump file as json (gzipped because it was quite big). jitdump.json.gz process_map.txt

vchuravy commented 1 year ago

Using https://gist.github.com/vchuravy/c2012c8cef577cc6f828fdc5b02e959a it shows that all of the tracked code loads from the JIT fall into three mappings:

sum(length, values(hist)) = 4550
4550

length(jitdump[:CodeLoads]) = 4550
4550

 Mapping(AddrRange(0x00007de9f67fc000, 0x00007de9f68fc000), "r-xp", nothing)
 Mapping(AddrRange(0x00007dea06afc000, 0x00007dea06bfc000), "r-xp", nothing)
 Mapping(AddrRange(0x00007f166d4cf000, 0x00007f166d5ce000), "r-xp", nothing)

If I haven't made a particular silly mistake xD

javierhonduco commented 1 year ago

Thanks for opening this issue and all context to understand how the Julia runtime operates!

Supporting a large number of mappings is not a high priority for us right now, so it might take a little while to get this done, but we are supportive of this change overall! 😄

Something I am sure that you are aware of but wanted to bring up for others is that unwinding your application is failing due to the lack of frame pointers. We first try to unwind using frame pointers

https://github.com/parca-dev/parca-agent/blob/c6aa3fd624b4cad7826fc86dbffccd2f4504dfec/bpf/cpu/cpu.bpf.c#L965-L968

If that's not possible we request the generation of unwind information derived from DWARF unwind information

https://github.com/parca-dev/parca-agent/blob/c6aa3fd624b4cad7826fc86dbffccd2f4504dfec/bpf/cpu/cpu.bpf.c#L998

Once the unwind info is present, we use said information to unwind the stack

https://github.com/parca-dev/parca-agent/blob/c6aa3fd624b4cad7826fc86dbffccd2f4504dfec/bpf/cpu/cpu.bpf.c#L970

If the different mappings (file backed or not like in Julia's case) don't have the DWARF unwind information in the .eh_frame section, unwinding won't work.

Just wanted to make sure that these mappings have the ELF section above. This can be checked with readelf:

$ readelf -S /bin/bash | grep eh_frame
  [19] .eh_frame_hdr     PROGBITS         000000000012bcf0  0012bcf0
  [20] .eh_frame         PROGBITS         0000000000130630  00130630

An alternative is the unwind information in the jitdump format, but we don't support this yet. Not sure if Julia populates this section either.

vchuravy commented 1 year ago

An alternative is the unwind information in the jitdump format, but we don't support this yet. Not sure if Julia populates this section either.

No not yet.

FPO is turned off for jitted code as of Julia 1.9.0-rc1

javierhonduco commented 1 year ago

Unfortunately without support for DWARF-unwind info in .eh_frame (or jitdump's once it's implemented, which AFAIK is very similarly placed) or present frame pointers, we won't be able to unwind the stack.

edit:

Just noticed that you mentioned that frame pointers are enabled. Could you check if the bottom frame is 0 as per the x86_64 ABI if I'm not mistaken?