heistp / irtt

Isochronous Round-Trip Tester
GNU General Public License v2.0
190 stars 23 forks source link

RTTs can be highly inaccurate on Windows #14

Closed heistp closed 6 years ago

heistp commented 6 years ago

The Windows client can show round-trip times that are zero or even negative, regardless of whether the wall or monotonic clock values are used (tested on VMWare Win 10 32-bit, Go 1.10.2).

heistp commented 6 years ago

It seems to be worse when using lower intervals. When locking goroutines to OS threads with --thread, the results are almost all zeroes. This suggests a thread scheduling related issue on Windows.

Negative RTTs should obviously be impossible, and this occurs because the measured server processing time may sometimes be greater than the measured round-trip time when there's this level of clock inaccuracy. I worked around this problem by not subtracting the server processing time if this is the case, but the inaccurate RTTs remain.

heistp commented 6 years ago

The current status for me is that the negative RTTs are taken care of by the new highPrecisionTimer flag. I can still see 0 RTTs when connecting to a server on localhost, which is probably when we reach the limits of the Windows timer.

When 32-bit Win 10 is running on VMWare, I see RTT jumps as follows (it's about a 16ms RTT to the server):

seq=0 rtt=575µs rd=-9.36ms sd=9.93ms ipdv=n/a
seq=1 rtt=17.54ms rd=1.76ms sd=15.78ms ipdv=16.97ms
seq=2 rtt=18.96ms rd=6.55ms sd=12.41ms ipdv=1.42ms
seq=3 rtt=18.92ms rd=4.63ms sd=14.29ms ipdv=36.2µs
seq=4 rtt=2.63ms rd=-8.59ms sd=11.23ms ipdv=16.29ms
seq=5 rtt=1.71ms rd=-11.81ms sd=13.52ms ipdv=924µs
seq=6 rtt=1.84ms rd=-8.59ms sd=10.43ms ipdv=131µs
seq=7 rtt=15.95ms rd=-7.1ms sd=23.05ms ipdv=14.11ms
seq=8 rtt=2.48ms rd=-9.55ms sd=12.03ms ipdv=13.47ms
seq=9 rtt=2.49ms rd=-10.85ms sd=13.34ms ipdv=8.75µs
seq=10 rtt=18.16ms rd=4.95ms sd=13.21ms ipdv=15.67ms

I do not see these same jumps with 64-bit Win 10 on a physical box:

seq=0 rtt=22.85ms rd=798.7ms sd=-775.9ms ipdv=n/a
seq=1 rtt=15.42ms rd=793.2ms sd=-777.8ms ipdv=7.43ms
seq=2 rtt=20.43ms rd=794.3ms sd=-773.9ms ipdv=5.01ms
seq=3 rtt=23.43ms rd=798ms sd=-774.6ms ipdv=3ms
seq=4 rtt=14.42ms rd=793.2ms sd=-778.8ms ipdv=9.01ms
seq=5 rtt=23.45ms rd=800.2ms sd=-776.7ms ipdv=9.03ms
seq=6 rtt=23.89ms rd=795.4ms sd=-771.5ms ipdv=440µs
seq=7 rtt=22.91ms rd=794.7ms sd=-771.8ms ipdv=980µs
seq=8 rtt=26.99ms rd=803.7ms sd=-776.7ms ipdv=4.08ms
seq=9 rtt=17.41ms rd=792.3ms sd=-774.9ms ipdv=9.59ms
sgissi commented 6 years ago

Testing in low latency (64bits VM less than 1ms latency) gets 0 pretty much all across. Any way to get accurate timings in Windows?

seq=0 rtt=1ms rd=-9.01ms sd=10.01ms ipdv=n/a seq=1 rtt=0s rd=5.6ms sd=-5.6ms ipdv=1ms seq=2 rtt=0s rd=3.61ms sd=-3.61ms ipdv=0s seq=3 rtt=0s rd=2.63ms sd=-2.63ms ipdv=0s seq=4 rtt=0s rd=-361µs sd=361µs ipdv=0s seq=5 rtt=0s rd=-1.35ms sd=1.35ms ipdv=0s seq=6 rtt=0s rd=-3.33ms sd=3.33ms ipdv=0s seq=7 rtt=0s rd=10.28ms sd=-10.28ms ipdv=0s seq=8 rtt=0s rd=9.29ms sd=-9.29ms ipdv=0s seq=9 rtt=0s rd=8.3ms sd=-8.3ms ipdv=0s seq=10 rtt=0s rd=7.32ms sd=-7.32ms ipdv=0s seq=11 rtt=0s rd=6.33ms sd=-6.33ms ipdv=0s seq=12 rtt=0s rd=1.34ms sd=-1.34ms ipdv=0s seq=13 rtt=0s rd=356µs sd=-356µs ipdv=0s seq=14 rtt=0s rd=-632µs sd=632µs ipdv=0s seq=15 rtt=0s rd=-1.62ms sd=1.62ms ipdv=0s seq=16 rtt=0s rd=10.99ms sd=-10.99ms ipdv=0s seq=17 rtt=0s rd=9.01ms sd=-9.01ms ipdv=0s seq=18 rtt=0s rd=8.02ms sd=-8.02ms ipdv=0s seq=19 rtt=0s rd=5.03ms sd=-5.03ms ipdv=0s

heistp commented 6 years ago

What is ping's average mean RTT on this link? I guess in Windows you'd see "<1ms" but Linux or OS/X would be needed for anything more precise.

I'm not sure if I can improve on this. Either I can find something else with the Windows timer APIs, or I'll have to change the display to something similar as Windows ping does (<1ms). I won't be able to take a look until next week, but meanwhile any tips are appreciated...

heistp commented 6 years ago

Ok, I'm optimistic that I can replace the monotonic clock values with calls to QueryPerformanceCounter / QueryPerformanceFrequency from kernel32.dll. In case you get a chance to try go run main.go for the attached, you should see output similar to this (this is from my VM):

Z:\Downloads>go run main.go
QueryPerformanceFrequency: 10000000
QueryPerformanceCounter: 290156891124 (700ns elapsed)
QueryPerformanceCounter: 290156897245 (500ns elapsed)
QueryPerformanceCounter: 290156903288 (600ns elapsed)
QueryPerformanceCounter: 290156913864 (600ns elapsed)
QueryPerformanceCounter: 290156919117 (400ns elapsed)
QueryPerformanceCounter: 290156923965 (500ns elapsed)
QueryPerformanceCounter: 290156928897 (600ns elapsed)
QueryPerformanceCounter: 290156933810 (500ns elapsed)
QueryPerformanceCounter: 290156998684 (1.1µs elapsed)
QueryPerformanceCounter: 290157007633 (600ns elapsed)

Calls: 1000000
Zeroes: 0
Min nonzero: 200ns
Max: 384.7µs
Mean: 265ns

I'm hoping not to see any zeroes, and for the mean elapsed time to be in the sub-microsecond range. It will be more expensive calling into the dll like this, but looks like it will be far more accurate, if it tests out ok when calling from different goroutines, etc.

main.zip

sgissi commented 6 years ago

Hi Pete, I'll give it a spin today and let you know. Thanks!

heistp commented 6 years ago

Thanks, and an update is attached so run this one instead. It looks like Go's standard time implementation isn't precise enough for the wall clock either, so I'll have to use GetSystemTimePreciseAsFileTime instead, which is now included in the test, along with a test of Go's standard time, which seems to quantize to 1ms.

Z:\Downloads>go run main.go
Standard Go Wall Time:
Time: 1528311859729190500 (0s elapsed)
Time: 1528311859730189900 (999.4µs elapsed)
Time: 1528311859731189100 (999.2µs elapsed)
Time: 1528311859731189100 (0s elapsed)
Time: 1528311859732188800 (999.7µs elapsed)
Iterations: 2000000
Zeroes: 1999905
Min nonzero: 992.3µs
Max: 3.0101ms
Mean: 50ns
Elapsed: 106.9972ms

QueryPerformanceCounter/Frequency:
QueryPerformanceFrequency: 10000000
QueryPerformanceCounter: 339554402978 (600ns elapsed)
QueryPerformanceCounter: 339554407755 (477.7µs elapsed)
QueryPerformanceCounter: 339554412550 (479.5µs elapsed)
QueryPerformanceCounter: 339554417207 (465.7µs elapsed)
QueryPerformanceCounter: 339554433863 (1.6656ms elapsed)
Iterations: 2000000
Zeroes: 0
Min nonzero: 200ns
Max: 1.8156ms
Mean: 280ns
Elapsed: 563.9931ms

GetSystemTimePreciseAsFileTime:
SystemTimePrecise: 1528311860403546400 (400ns elapsed)
SystemTimePrecise: 1528311860403993800 (447.4µs elapsed)
SystemTimePrecise: 1528311860404456200 (462.4µs elapsed)
SystemTimePrecise: 1528311860404916300 (460.1µs elapsed)
SystemTimePrecise: 1528311860405355300 (439µs elapsed)
Iterations: 2000000
Zeroes: 0
Min nonzero: 100ns
Max: 6.5248ms
Mean: 241ns
Elapsed: 485.9941ms

main.zip

heistp commented 6 years ago

Please give irtt a try on Windows again in your environment as I just integrated support for QueryPerformanceCounter and GetSystemTimePreciseAsFileTime. I'll be glad to hear if it works on a low RTT network with 64-bit Windows, but the results are looking much better for be now (see below). Thanks!

Z:\Downloads>irtt client -i 10ms -d 1s -q localhost
[Connecting] connecting to localhost
[127.0.0.1:2112] [Connected] connection established

                        Min    Mean  Median     Max  Stddev
                        ---    ----  ------     ---  ------
                RTT    84µs   138µs   116µs   223µs  46.6µs
         send delay  41.4µs    81µs  63.2µs   150µs  35.7µs
      receive delay  42.1µs  56.7µs  54.9µs  95.7µs  11.7µs

      IPDV (jitter)      0s  35.5µs  15.5µs   114µs  36.9µs
          send IPDV      0s  27.6µs  11.9µs  97.9µs  29.4µs
       receive IPDV      0s  9.72µs   7.9µs  31.9µs  8.23µs

     send call time  96.8µs   158µs           256µs  52.7µs
        timer error   201µs   475µs          2.09ms   266µs
  server proc. time   3.9µs  7.27µs          31.9µs  3.85µs

                duration: 989.9ms (wait 0s)
   packets sent/received: 100/100 (0.00% loss)
 server packets received: 100/100 (0.00%/0.00% loss up/down)
     bytes sent/received: 6000/6000
       send/receive rate: 48.5 Kbps / 48.5 Kbps
           packet length: 60 bytes
             timer stats: 0/100 (0.00%) missed, 4.75% error
heistp commented 6 years ago

I also tested this on a physical box with 64-bit Windows and the results look good. Note that it's possible to see a 0 IPDV value because the resolution is 100ns when it's 1ns on Linux, for example, but that's not a significant problem.