When multiple CPUs are busy running the Java program, the accuracy of honest profiler depends on SIGPROF being delivered to a randomly-chosen (with uniform distribution) running thread. It seems like on Linux this is not the case, at least not when the setitimer() API is being used.
I've tested a physical machine running Fedora 28, kernel 4.18.10-200.fc28.x86_64, and also an aws EC2 m5.xlarge machine running CentOS 7.5
To illustrate consider the following program:
public class Main {
private static void wait1() {
long base = System.nanoTime();
while (System.nanoTime() - base < 1000 * 1000 * 1000)
;
}
private static void wait2() {
long base = System.nanoTime();
while (System.nanoTime() - base < 1000 * 1000 * 1000)
;
}
public static void main(String args[]) throws InterruptedException {
Thread t1 = new Thread(() -> {
wait1();
});
Thread t2 = new Thread(() -> {
wait2();
});
t1.start();
t2.start();
t1.join();
t2.join();
}
}
Compile and run:
javac Main.java
time java -agentpath:liblagent.so=interval=3,logPath=log.hpl,start=1 Main
Starting sampling
Stopping sampling
real 0m1.145s
user 0m2.173s
sys 0m0.034s
The output of the time command shows that two CPUs were running the two threads. We would expect roughly same number of ticks assigned to each thread. The result I get in a typical run is this:
It's important to stress that the issue is with setitimer - it can be demonstrated in a C program as well, and is also reported here
Using timer_create() with per-thread timers (CLOCK_THREAD_CPUTIME_ID) seems to produce accurate profiles - on a test C program.
When multiple CPUs are busy running the Java program, the accuracy of honest profiler depends on SIGPROF being delivered to a randomly-chosen (with uniform distribution) running thread. It seems like on Linux this is not the case, at least not when the setitimer() API is being used.
I've tested a physical machine running Fedora 28, kernel 4.18.10-200.fc28.x86_64, and also an aws EC2 m5.xlarge machine running CentOS 7.5
To illustrate consider the following program:
Compile and run:
The output of the time command shows that two CPUs were running the two threads. We would expect roughly same number of ticks assigned to each thread. The result I get in a typical run is this:
It's important to stress that the issue is with setitimer - it can be demonstrated in a C program as well, and is also reported here
Using timer_create() with per-thread timers (CLOCK_THREAD_CPUTIME_ID) seems to produce accurate profiles - on a test C program.