jvm-profiling-tools / perf-map-agent

A java agent to generate method mappings to use with the linux `perf` tool
GNU General Public License v2.0
1.65k stars 260 forks source link

Some C stack frames with Java ancestor not correctly shown #73

Closed franz1981 closed 5 years ago

franz1981 commented 6 years ago

HI guys!

I have produced a flame graph with:

PERF_COLLAPSE_OPTS="--kernel --tid" PERF_RECORD_FREQ=99 PERF_RECORD_SECONDS=10 PERF_MAP_OPTIONS=unfoldall ~/perf-map-agent/bin/perf-java-flames <pid>

image

But seems that (some) C stack frames are not correctly shown as children of the related Java calls. Am I missing anything in the configuration?

Thanks, Franz

franz1981 commented 6 years ago

Consider that async-profiler is working correctly. I'm on Fedora 27 and I've tried with both java 8 and 9 with the same result :(

franz1981 commented 6 years ago

@jrudolph @apangin any ideas re what I'm missing? I believe it can't be an issue of the tool, but I have verified that is from Fedora 23 that is happening...

franz1981 commented 6 years ago

With Red Hat Enterprise Linux Server release 7.4 (Maipo) it seems to work fine: image

ceeaspb commented 6 years ago

Hi @franz1981

There are a few of similarities with an issue in another tracing tool:

1) initial kernel version - Fedora 27 is reported as 4.13 (wikipedia). It works on some kernels but not others. 2) The flamegraph you provided is showing all of the bottom of the broken stacks to be SYSCALL related. 3) Some stacks correct, some not.

The similar issue is https://github.com/iovisor/bcc/issues/1641#issuecomment-377669157

Can you see if the workarounds (trace raw_syscalls:sys_enter, or auditctl change) work, or if upgrading to a later fixed kernel version work?

It's not stated but I am assuming you have set preserve frame pointer on the java process.

I suspect that async-profiler works because it gets the kernel stack from perf and the userspace stack from the JVM, then presents them together not depending on the rbp being valid below the SYSCALL.

franz1981 commented 6 years ago

I've just upgraded the kernel to

Linux matrix 4.16.13-200.fc27.x86_64 #1 SMP Wed May 30 15:03:53 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

And the issue is happening :( I'm trying the other workarounds to see if they can fix it... Many thnks for the help @ceeaspb

franz1981 commented 6 years ago

@ceeaspb sudo auditctl -a never,exit -S umask doesn't seem (if I've applied it right) to solve the issue too:

[root@matrix forked_franz]# auditctl -l
-a never,task
-a never,exit -S umask

I can confirm that raw_syscalls:sys_enter ( and sys_exit) isn't working for me.

It's not stated but I am assuming you have set preserve frame pointer on the java process.

Yes, exactly and debug non safepoints too

I suspect that async-profiler works because it gets the kernel stack from perf and the userspace stack from the JVM, then presents them together not depending on the rbp being valid below the SYSCALL.

I suppose that @apangin (or @nitsanw ) are the right persons to answer it, but looking at the merge logic it seems the case!

apangin commented 6 years ago

The stack trace breaks at libc/pthread functions, so the problem may also be related to how libc/pthread are compiled. May be, it's omit-frame-pointer optimization or something. I'll be able to tell later if I manage to reproduce the issue on my side. I don't have Fedora at the moment.

As to async-profiler, it can successfully recover Java stack trace, because it does not rely on native frame pointer. When JVM calls native function, it saves a pointer to the top Java frame in a Thread strucutre, so that AsyncGetCallTrace can easily find how to unwind Java stack.

franz1981 commented 6 years ago

@apangin It makes sense indeed: let me check on the fedora mail list if I can get anything from that... The only other option I have is to print the ASM and check on prologue/epilogue of the calls with unknown ancestor if rbp is been correctly handled right?

nitsanw commented 5 years ago

Closing this issue, as there's nothing PMA can do to solve