jvm-profiling-tools / honest-profiler

A sampling JVM profiler without the safepoint sample bias
https://github.com/RichardWarburton/honest-profiler/wiki
MIT License
1.25k stars 146 forks source link

Details of Linux itimer implementation and SIGPROF delivery may cause very skewed profiles #252

Open urisimchoni opened 5 years ago

urisimchoni commented 5 years ago

When multiple CPUs are busy running the Java program, the accuracy of honest profiler depends on SIGPROF being delivered to a randomly-chosen (with uniform distribution) running thread. It seems like on Linux this is not the case, at least not when the setitimer() API is being used.

I've tested a physical machine running Fedora 28, kernel 4.18.10-200.fc28.x86_64, and also an aws EC2 m5.xlarge machine running CentOS 7.5

To illustrate consider the following program:

public class Main {
    private static void wait1() {
        long base = System.nanoTime();
        while (System.nanoTime() - base < 1000 * 1000 * 1000)
            ;
    }

    private static void wait2() {
        long base = System.nanoTime();
        while (System.nanoTime() - base < 1000 * 1000 * 1000)
            ;
    }

    public static void main(String args[]) throws InterruptedException {
        Thread t1 = new Thread(() -> {
            wait1();
        });
        Thread t2 = new Thread(() -> {
            wait2();
        });
        t1.start();
        t2.start();
        t1.join();
        t2.join();
    }
}

Compile and run:

javac Main.java
time java -agentpath:liblagent.so=interval=3,logPath=log.hpl,start=1 Main
Starting sampling
Stopping sampling

real    0m1.145s
user    0m2.173s
sys 0m0.034s

The output of the time command shows that two CPUs were running the two threads. We would expect roughly same number of ticks assigned to each thread. The result I get in a typical run is this: hp

It's important to stress that the issue is with setitimer - it can be demonstrated in a C program as well, and is also reported here

Using timer_create() with per-thread timers (CLOCK_THREAD_CPUTIME_ID) seems to produce accurate profiles - on a test C program.