jvm-profiling-tools / perf-map-agent

A java agent to generate method mappings to use with the linux `perf` tool
GNU General Public License v2.0
1.65k stars 260 forks source link

Partial Stacks #5

Closed brendangregg closed 9 years ago

brendangregg commented 10 years ago

This is probably a JVM issue, but I'll file it here anyway as it's something anyone may run into when using perf-map-agent.

Many stacks that perf reports look partial/incomplete, showing the top frame only. Eg, the following output is from "perf report --stdio":

     0.78%     java  perf-22919.map      [.] Lio/netty/handler/codec/http/DefaultHttpHeaders;.add(Ljava/lang/CharSequence;Ljava/lang/Object;)Lio/netty/handler/codec/
               |
               --- Lio/netty/handler/codec/http/DefaultHttpHeaders;.add(Ljava/lang/CharSequence;Ljava/lang/Object;)Lio/netty/handler/codec/http/HttpHeaders;

     0.78%     java  perf-22919.map      [.] Lrx/subjects/PublishSubject;.onNext(Ljava/lang/Object;)V
               |
               --- Lrx/subjects/PublishSubject;.onNext(Ljava/lang/Object;)V

     0.78%     java  perf-22919.map      [.] Lio/netty/channel/AbstractChannel$AbstractUnsafe;.write(Ljava/lang/Object;Lio/netty/channel/ChannelPromise;)V
               |
               --- Lio/netty/channel/AbstractChannel$AbstractUnsafe;.write(Ljava/lang/Object;Lio/netty/channel/ChannelPromise;)V

It looks a lot like omit-frame-pointer optimization, however, I believe this is from hotspot. If there is a hotspot equivalent option for -fno-omit-frame-pointer, I haven't found it yet.

I am using:

-XX:-OmitStackTraceInFastThrow -XX:+UnlockDiagnosticVMOptions -XX:+ShowHiddenFrames 
brendangregg commented 10 years ago

I should add that this happens with different JDK versions, and it's using the default perf profiling method (frame pointer). I'm compiling up perf to try out the "-g dwarf" method of stack unwinding, but I'm not expecting it to really work on hotspot.

jrudolph commented 10 years ago

Yes, that's something I've also observed. I haven't researched if this happens for all JIT compiled -> JIT compiled calls or only for some. It's been a while since I looked at JIT-generated assembly code but maybe it would be in order to figure it out.

brendangregg commented 10 years ago

This is indeed a JVM issue, and is best described by bug:

https://bugs.openjdk.java.net/browse/JDK-6276264

While that describes the problem with DTrace jstack(), I believe the same optimization (using the frame pointer as a register) is breaking perf_events.

This may be fixed in JDK 9. See:

http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2014-June/014842.html

jrudolph commented 10 years ago

Cool, good to know. Thanks, @brendangregg, for digging this out and supporting this issue on the openjdk mailing list.

The gist of it is this:

One is the frame pointer is used by the server compiler as a general purpose register on intel.

This means that a generic stack walker (like the one from perf_events) isn't able to walk the stack out of compiled methods that make use of the frame pointer register.

If jstack -m is suffering from those same issues I wonder what that means for the result of the AsyncGetCallTrace call. What would that report in those cases?

(I just looked into it and it seems that jstack -m and AsyncGetCallTrace are not doing the same thing. jstack -m relies on simple stack walker which is part of the HotSpot SA (see http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/7-b147/sun/jvm/hotspot/tools/PStack.java?av=f#60) while AsyncGetClassTrace is actually executed inside the JVM and able to use any of the Hotspot-internal frame analysis methods which may or may not gather more info than an outside party.)

jrudolph commented 10 years ago

jstack -m relies on simple stack walker which is part of the HotSpot SA

I just figured that this part may be correct (on the stack walker being too simple) but it may still be that jstack -m is just not trying hard enough to figure out the sender address. It seems the SA has all the data-structures available to do the same as AsyncGetClassTrace which would be to look up the PC in the JVM CodeCache, access all the metadata for the compiled method and then somehow figure out what state the stack should be in to derive the return address.

jrudolph commented 9 years ago

As @brendangregg recently reported in a blog post at netflix, recent (currently preview) versions of hotspot come with a new setting -XX:+PreserveFramePointer which fixes this issue. Awesome work!