jvm-profiling-tools / perf-map-agent

A java agent to generate method mappings to use with the linux `perf` tool
GNU General Public License v2.0
1.65k stars 260 forks source link

Interpreter frames #6

Closed brendangregg closed 10 years ago

brendangregg commented 10 years ago

I often see "Interpreter" frames. I think I was expecting more symbols.

   100.00%     java  perf-20844.map  [.] Interpreter
               |
               --- Interpreter
                   Interpreter
                   Interpreter
                   Interpreter
                   call_stub
                   JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*, Thread*)
                   jni_invoke_static(JNIEnv_*, JavaValue*, _jobject*, JNICallType, _jmethodID*, JNI_ArgumentPusher*, Thread*)
                   jni_CallStaticVoidMethod
                   JavaMain
                   start_thread
jrudolph commented 10 years ago

Thanks for filing this here.

Would you expect more JVM symbols or actual Java symbols? I guess more JVM symbols could be possible but probably they would not be particularly helpful because I'd expect them to be all of the kind Interpreter.runMethod, no?

I guess you could find the actual interpreted Java methods then inside those stack frames (obviously, as the JVM must be able to collect stack traces spanning interpreted, JIT compiled, and JVM and JNI code). However, this data is only ephemeral and probably cannot be extracted from what the kernel perf event engine extracts while handling perf events (which looks like a very simple stack walking algorithm). I guess you know much more about how this works than me, so please correct me if I'm wrong.

On the other side, not being able to look into interpreted stack frames (or into the call chain, generally) is usually no deal breaker as hotspots are usually JIT compiled and you can use the standard profiling tools to get a rough picture about how stack traces for those methods look like if necessary. Having it integrated would still be cool...

jrudolph commented 10 years ago

Btw. my guess why there are no more detailled JVM symbols and only the broad "Interpreter" instead is that Hotspot generates the interpreter at runtime for some reasons into always different memory locations.

brendangregg commented 10 years ago

I was expecting more Java symbols, but I realize this is much more difficult when the interpreter is running, since only the interpreter functions are on the stack, and we'd need to pull context (and Java symbols) from their arguments. Which would be beyond the scope of perf-map-agent.

As a work-around, -Xcomp can be used, to force precompilation and avoid the interpreter. Note that this does reduce performance, since JIT can't compile based on runtime profiling information. For my test application, performance reduced by 30%.

zeocio commented 8 years ago

One possibility is to run (obviously only oracle jdk) JFR stack sampling and perf stack sampling at the same time and use an external tool to merge the 2 data sets to provide greater granularity where interpreted frames are present. I came across this recently: https://github.com/chrishantha/jfr-flame-graph

toaler commented 5 years ago

Resolving the Interpreter frames is still useful, in applications with a high amount of total paths, not understanding the interpreted frames can be very time consuming for the engineer to work out the subsuming methods that are dominating the calls to the known jitted frames.

We were looking for a solution on MacOS that allows us to get all JVM stacks (jit, gc, application code) as well as resolving all frames (JVM, Java interpreter/native, libaries, etc). dtrace + perf-map-agent is great for this, other than the tradeoff on not being able to get the interpreted frames. async-profiler/honest-profiler can resolve the interpreted frames. I suppose it's easier for getAsyncCallTrace to resolve the interpreted code as it's running in the same context that has that metadata, where dtrace doesn't have the luxury.

Since we are interested in OSX native invocations, honest-profiler is out of the mix. async-profiler can acquire interpreted/native frames as well on MacOS, however it has a overhead when using it to evaluate JVM startup times, which we are interested in.

Anyways thanks for added the dtrace scripts as they are proving to be very useful.

squito commented 5 years ago

Getting more info from the interpreter frames would also be useful with bcc's trace, which also takes advantage of this file. I'm trying to track the source of large malloc calls, eg. trace -p [pid] -K -U -a 'c:malloc "size = %d", arg1'. Those calls aren't necessarily so frequently (something is allocating a lot of memory into its own internal pool). Maybe there is a better way to do this directly in trace?

In any case, I can understand why this is hard to do

apangin commented 5 years ago

@squito The neighbor project async-profiler is already capable of doing what you want. It traces malloc calls in -e malloc mode and records allocation size along with the mixed stack trace: native + Java, including interpreter frames. Please open a ticket there if you need help in getting this work.

squito commented 5 years ago

awesome, thanks @apangin , I will check it out!