soenmie / disruptor

Automatically exported from code.google.com/p/disruptor
0 stars 0 forks source link

waitForFreeSlotAt() uses LockSupport.parkNanos(1L) can take up to 15ms on Windows #16

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
I wrote a simple performance test using the Disruptor to test mostly-empty and 
mostly-full scenarios in the ring buffer.

The SingleThreadedClaimStrategy() provides waitForFreeSlotAt() to block until a 
place in the ring buffer is available. Within waitForFreeSlotAt() the thread is 
parked repeatedly using LockSupport.parkNanos(1L) until a place on the ring 
buffer is available.

LockSupport.parkNanos() uses the JVM to unsafe.park() the LWP in the OS.

I tested the delay of LockSupport.parkNanos(1L) in 3 environments:

a) Windows 7/32bit with i7 M620 @2.67 GHz with Java 1.6.0_23
b) Solaris 10 5/08 s10s_u5wos_10 SPARC VII with Java 1.6.0_22
c) Linux Redhat 2.6.18-194.el5 with AMD Opteron(tm) Processor 6164 with Java 
1.6.0_29

with the following findings:
a) avg park() delay times are around 15 millisenconds
b) avg park() delay times are around 6 microsenconds
c) avg park() delay times are around 8 microsenconds

I am confused by the much longer delay of unsafe.park() in my Windows machine, 
compared to Linux and Solaris.

Testing the Disruptor with a BusySpinWaitStrategy() for 100000 objects on 
Windows I find:
a) waitForFreeSlotAt() using LockSupport.parkNanos(1L) (unmodified)
       min   0.384 us
      mean  17.272 us
      max 8488.585 us
   std dev 108.792 us

b) waitForFreeSlotAt() using Thread.yield() instead of LockSupport.parkNanos(1L)
       min   0.000 us
      mean   0.420 us
      max 4692.955 us
   std dev  21.881 us

b) waitForFreeSlotAt() using BusyLoop (just an empty while loop)
       min   0.000 us
      mean   0.259 us
       max 236.745 us
   std dev   0.921 us

If you can reproduce the delays of LockSupport.parkNanos(1L) in your 
environment, I suggest to change waitForFreeSlotAt() to use Thread.yield() 
instead of LockSupport.parkNanos(1L)

Fantastic work. Thank you.

Original issue reported on code.google.com by thalmann...@gmail.com on 10 Jan 2012 at 10:17

GoogleCodeExporter commented 8 years ago
Hi,

Would you be able to post a link to the code that you are using for the 
performance tests?

Thanks,
Mike

Original comment by mike...@gmail.com on 5 Feb 2012 at 7:32

GoogleCodeExporter commented 8 years ago
Hi Mike,

I noticed that in my post above, the second set of bullet points using 
a), b) and c) are confusing, since they are unrelated to the list above it. 
Sorry about that.

To test LockSupport.parkNanos(1L) I simply run

private static final int NR_ITER = 10000;
private final double results[] = new double[NR_ITER];

public void testUnsafePark() {
    long startTime;
    for (int c = 0; c < NR_ITER; c++) {
        startTime = System.nanoTime();
        LockSupport.parkNanos(1L);
        results[c] = (System.nanoTime() - startTime) / 1000;
    }
}

After you replied, I went and checked my findings again. I am surprised to find
that the behavior of my machine has changed. After digging a while, I found that
the company installed windows patches.

Findings on Windows 7 Enterprise Service Pack 1 without adding the 
newest windows updates:
INFO: Java 1.6.0_23 is 32bit.
INFO: parkNanos(1L)       min   560.000 us
INFO: parkNanos(1L)      mean 15539.838 us
INFO: parkNanos(1L)       max 51187.000 us
INFO: parkNanos(1L)   std dev  1028.606 us

Findings on Windows 7 Enterprise Service Pack 1 after adding the 
newest windows updates:
INFO: Java 1.6.0_23 is 32bit.
INFO: parkNanos(1L)       min   87.000 us
INFO: parkNanos(1L)      mean  998.509 us
INFO: parkNanos(1L)       max 6650.000 us
INFO: parkNanos(1L)   std dev   92.886 us

The windows updates must have change the scheduler behaviour!

Re-running the same code as I used to test the distruptor earlier 
this year I find:
      min     0.384 us
     mean     6.501 us
      max  8425.161 us
  std dev   134.140 us

Please compare these results with "a) waitForFreeSlotAt() using 
LockSupport.parkNanos(1L) (unmodified)"
from the issue report.

My motivation for those tests is to replace our use of SynchronousQueue() by 
using the discruptor, hence measuring the delay to pass a single element.

As you requested, I attached the java code for my test class.

The changes to use a busy loop or Thread.yield instead of 
LockSupport.parkNanos() 
are done in the SingleThreadedClaimStratey.java, around line 118.

I believe the windows update solved the mystery and is not related to your use 
of LockSupport.parkNanos(), altough it would be nice if the ClaimStrategy for 
the buffer-full scenario would match the chosen WaitStrategy.

Thank you very much,
 Michael

Original comment by thalmann...@gmail.com on 7 Feb 2012 at 8:28

Attachments:

GoogleCodeExporter commented 8 years ago
The default timer resolution is 15.6ms on Windows.

http://download.microsoft.com/download/3/0/2/3027D574-C433-412A-A8B6-5E0A75D5B23
7/Timer-Resolution.docx

Original comment by nilskp on 26 Sep 2012 at 12:18