Open aristotelhs opened 1 year ago
maps.txt Updated an example map file.
What allocator are you using? Are you using jemalloc?
Okay yes this is jemalloc
based. I see 418 xp
mapping.
I am slightly confused by the pattern:
7f1e4dab0000-7f1e4dac0000 rwxp 00000000 00:00 0
7f1e4dac0000-7f1e4dac1000 ---p 00000000 00:00 0
7f1e4dac1000-7f1e4e2c1000 rwxp 00000000 00:00 0
7f1e4e2c1000-7f1e4e2c2000 ---p 00000000 00:00 0
7f1e4e2c2000-7f1e4eac2000 rwxp 00000000 00:00 0
7f1e4eac2000-7f1e4eac3000 ---p 00000000 00:00 0
7f1e4eac3000-7f1e4f2c3000 rwxp 00000000 00:00 0
That looks like a guard page in between JIT segments. Actually could you try and capture both the /proc/PID/maps
and the jitdump
file? (Remember to start with ENABLE_JITPROFILING=1
) this should allow us to match pages with actual executable data.
This was my thoughts that this was causes by a lot of jitted functions. (And the guard page makes sense). [yes we use jemalloc as you mentioned]. I was thinking of maybe fiddling with jemalloc options to remove the guard pages and make this more continuous, but I believe there are security implications of doing so. Unfortunately this process was long running and did not get a chance to be started with jit profiling enabled (we have made it default a week ago). I will try to find another process and get the data out of this and attach them here.
Yeah Julia does manage it's own memory through mmap
for JIT, so we might bypass jemalloc (I see the same guard page without jemalloc)
I attach another process, which has less mappings (253, still larger than 250) and also a the jitdump file as json (gzipped because it was quite big). jitdump.json.gz process_map.txt
Using https://gist.github.com/vchuravy/c2012c8cef577cc6f828fdc5b02e959a it shows that all of the tracked code loads from the JIT fall into three mappings:
sum(length, values(hist)) = 4550
4550
length(jitdump[:CodeLoads]) = 4550
4550
Mapping(AddrRange(0x00007de9f67fc000, 0x00007de9f68fc000), "r-xp", nothing)
Mapping(AddrRange(0x00007dea06afc000, 0x00007dea06bfc000), "r-xp", nothing)
Mapping(AddrRange(0x00007f166d4cf000, 0x00007f166d5ce000), "r-xp", nothing)
If I haven't made a particular silly mistake xD
Thanks for opening this issue and all context to understand how the Julia runtime operates!
Supporting a large number of mappings is not a high priority for us right now, so it might take a little while to get this done, but we are supportive of this change overall! 😄
Something I am sure that you are aware of but wanted to bring up for others is that unwinding your application is failing due to the lack of frame pointers. We first try to unwind using frame pointers
If that's not possible we request the generation of unwind information derived from DWARF unwind information
Once the unwind info is present, we use said information to unwind the stack
If the different mappings (file backed or not like in Julia's case) don't have the DWARF unwind information in the .eh_frame
section, unwinding won't work.
Just wanted to make sure that these mappings have the ELF section above. This can be checked with readelf:
$ readelf -S /bin/bash | grep eh_frame
[19] .eh_frame_hdr PROGBITS 000000000012bcf0 0012bcf0
[20] .eh_frame PROGBITS 0000000000130630 00130630
An alternative is the unwind information in the jitdump format, but we don't support this yet. Not sure if Julia populates this section either.
An alternative is the unwind information in the jitdump format, but we don't support this yet. Not sure if Julia populates this section either.
No not yet.
FPO is turned off for jitted code as of Julia 1.9.0-rc1
Unfortunately without support for DWARF-unwind info in .eh_frame
(or jitdump's once it's implemented, which AFAIK is very similarly placed) or present frame pointers, we won't be able to unwind the stack.
edit:
Just noticed that you mentioned that frame pointers are enabled. Could you check if the bottom frame is 0 as per the x86_64 ABI if I'm not mistaken?
Currently the default support for executable mappings is 250. Unfortunately since we are running a much larger executable we are reaching the number of mappings to ~420. From the discussion the biggest drawbacks are increased memory consumption, complexity added to the kernel bpf verifier, and time required to do the in-kernel unwinding.