oracle / graal

GraalVM compiles Java applications into native executables that start instantly, scale fast, and use fewer compute resources 🚀
https://www.graalvm.org
Other
20.42k stars 1.64k forks source link

Invalid stack traces in JFR recording #3495

Open apangin opened 3 years ago

apangin commented 3 years ago

Describe the issue When running GraalVM with JVMCI compiler enabled (default), tools may show impossible allocation stack traces. For example, on the following screenshot, JMC shows that the allocation of byte[] array happened in Long.stringSize, whereas this method does not allocate anything at all.

graalvm-alloc

The bug was originally reported against async-profiler project, but after the investigation I came to conclusion that the bug is not in the profiler, but rather in the JVM.

I was able to reproduce the problem with JFR. But when I disable -XX:-UseJVMCICompiler, the stack traces become correct.

Steps to reproduce the issue

  1. Download GraalVM CE 21.2.0-dev for Linux x64
  2. Run Allocate.java:
    java -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints -XX:StartFlightRecording=filename=myrecording.jfr,settings=profile Allocate
  3. Open myrecording.jfr in JMC

Describe GraalVM and your environment:

oubidar-Abderrahim commented 3 years ago

Thank you for reporting this, we will look into it

oubidar-Abderrahim commented 3 years ago

Does the shared screenshot come from the reproducer? I ask because when I open the JFR generated by the reproducer using JMC I see 0 events on "Allocation in new TLAB"

apangin commented 3 years ago

Yes, it comes from the reproducer. Forgot to mention that JFR settings need to be set to profile rather than default, sorry. I've edited the command line in the original report.

oubidar-Abderrahim commented 3 years ago

The closest I found in the JFR to what was shown is this: jfr

I want to make sure that I got the right one. also I see the same thing when I run using -XX:-UseJVMCICompiler Would you mind rechecking on your side using -XX:-UseJVMCICompiler to confirm if it still occurs with flag? Thank you.

apangin commented 3 years ago

Not this one.

OK, let me simplify the test case. Here is my AllocTest.java:

public class AllocTest {

    public static void main(String[] args) {
        for (long n = 0; n < 20_000_000; n++) {
            String.format("n=%d", n);
        }
    }
}

I run it twice with the following options:

  1. With Graal compiler, recording profile to jvmci.jfr:
    java -Xmx1g -Xms1g -XX:-TieredCompilation -XX:StartFlightRecording=filename=jvmci.jfr,settings=profile AllocTest.java
  2. With -XX:-UseJVMCICompiler, recording profile to no-jvmci.jfr:
    java -Xmx1g -Xms1g -XX:-TieredCompilation -XX:-UseJVMCICompiler -XX:StartFlightRecording=filename=no-jvmci.jfr,settings=profile AllocTest.java

Here are the result JFR files:

Now if we switch to Memory / Tree View tab in JMC, we'll see the aggregated stack traces. Note Long.getChars and Long.stringSize. None of them allocate for real:

jfr1

The second profile, no-jvmci.jfr, looks accurate:

jfr2

thomaswue commented 3 years ago

This is likely a result of the partial escape analysis of the GraalVM compiler moving an allocation into this code section. With those advanced code optimizations, there is no guarantee that an allocation is happening at the point in the program where a new object is created.

Does the same result appear when PEA is disabled? "-Dgraal.PartialEscapeAnalysis=false"

apangin commented 3 years ago

Yes, the problem persists with -Dgraal.PartialEscapeAnalysis=false, too.

AFAIU, it should not really matter where the optimizing compiler moves the allocation, as long as it maintains the location mapping in DebugInfo. BTW, allocations seen by JFR always come through the slow path (runtime stub).

Here is an excerpt from -XX:+PrintAssembly, where the call to the runtime stub is mapped to Long.stringSize rather than newarray bytecode in Long.toString:

  0x00007fb93e78d69b: movabs $0x800389780,%rsi  ;   {metadata({type array byte})}
  0x00007fb93e78d6a5: mov    %r11d,%edx
  0x00007fb93e78d6a8: callq  0x00007fb93e6e8080 ; ImmutableOopMap{}
                                                ;*iload {reexecute=1 rethrow=0 return_oop=0}
                                                ; - (reexecute) java.lang.Long::stringSize@33 (line 584)
                                                ; - java.lang.Long::toString@1 (line 471)
                                                ;   {runtime_call Stub<new_array_or_null(KlassPointer,int)Object>}