Closed heistp closed 6 years ago
It seems to be worse when using lower intervals. When locking goroutines to OS threads with --thread
, the results are almost all zeroes. This suggests a thread scheduling related issue on Windows.
Negative RTTs should obviously be impossible, and this occurs because the measured server processing time may sometimes be greater than the measured round-trip time when there's this level of clock inaccuracy. I worked around this problem by not subtracting the server processing time if this is the case, but the inaccurate RTTs remain.
The current status for me is that the negative RTTs are taken care of by the new highPrecisionTimer flag. I can still see 0 RTTs when connecting to a server on localhost, which is probably when we reach the limits of the Windows timer.
When 32-bit Win 10 is running on VMWare, I see RTT jumps as follows (it's about a 16ms RTT to the server):
seq=0 rtt=575µs rd=-9.36ms sd=9.93ms ipdv=n/a
seq=1 rtt=17.54ms rd=1.76ms sd=15.78ms ipdv=16.97ms
seq=2 rtt=18.96ms rd=6.55ms sd=12.41ms ipdv=1.42ms
seq=3 rtt=18.92ms rd=4.63ms sd=14.29ms ipdv=36.2µs
seq=4 rtt=2.63ms rd=-8.59ms sd=11.23ms ipdv=16.29ms
seq=5 rtt=1.71ms rd=-11.81ms sd=13.52ms ipdv=924µs
seq=6 rtt=1.84ms rd=-8.59ms sd=10.43ms ipdv=131µs
seq=7 rtt=15.95ms rd=-7.1ms sd=23.05ms ipdv=14.11ms
seq=8 rtt=2.48ms rd=-9.55ms sd=12.03ms ipdv=13.47ms
seq=9 rtt=2.49ms rd=-10.85ms sd=13.34ms ipdv=8.75µs
seq=10 rtt=18.16ms rd=4.95ms sd=13.21ms ipdv=15.67ms
I do not see these same jumps with 64-bit Win 10 on a physical box:
seq=0 rtt=22.85ms rd=798.7ms sd=-775.9ms ipdv=n/a
seq=1 rtt=15.42ms rd=793.2ms sd=-777.8ms ipdv=7.43ms
seq=2 rtt=20.43ms rd=794.3ms sd=-773.9ms ipdv=5.01ms
seq=3 rtt=23.43ms rd=798ms sd=-774.6ms ipdv=3ms
seq=4 rtt=14.42ms rd=793.2ms sd=-778.8ms ipdv=9.01ms
seq=5 rtt=23.45ms rd=800.2ms sd=-776.7ms ipdv=9.03ms
seq=6 rtt=23.89ms rd=795.4ms sd=-771.5ms ipdv=440µs
seq=7 rtt=22.91ms rd=794.7ms sd=-771.8ms ipdv=980µs
seq=8 rtt=26.99ms rd=803.7ms sd=-776.7ms ipdv=4.08ms
seq=9 rtt=17.41ms rd=792.3ms sd=-774.9ms ipdv=9.59ms
Testing in low latency (64bits VM less than 1ms latency) gets 0 pretty much all across. Any way to get accurate timings in Windows?
seq=0 rtt=1ms rd=-9.01ms sd=10.01ms ipdv=n/a seq=1 rtt=0s rd=5.6ms sd=-5.6ms ipdv=1ms seq=2 rtt=0s rd=3.61ms sd=-3.61ms ipdv=0s seq=3 rtt=0s rd=2.63ms sd=-2.63ms ipdv=0s seq=4 rtt=0s rd=-361µs sd=361µs ipdv=0s seq=5 rtt=0s rd=-1.35ms sd=1.35ms ipdv=0s seq=6 rtt=0s rd=-3.33ms sd=3.33ms ipdv=0s seq=7 rtt=0s rd=10.28ms sd=-10.28ms ipdv=0s seq=8 rtt=0s rd=9.29ms sd=-9.29ms ipdv=0s seq=9 rtt=0s rd=8.3ms sd=-8.3ms ipdv=0s seq=10 rtt=0s rd=7.32ms sd=-7.32ms ipdv=0s seq=11 rtt=0s rd=6.33ms sd=-6.33ms ipdv=0s seq=12 rtt=0s rd=1.34ms sd=-1.34ms ipdv=0s seq=13 rtt=0s rd=356µs sd=-356µs ipdv=0s seq=14 rtt=0s rd=-632µs sd=632µs ipdv=0s seq=15 rtt=0s rd=-1.62ms sd=1.62ms ipdv=0s seq=16 rtt=0s rd=10.99ms sd=-10.99ms ipdv=0s seq=17 rtt=0s rd=9.01ms sd=-9.01ms ipdv=0s seq=18 rtt=0s rd=8.02ms sd=-8.02ms ipdv=0s seq=19 rtt=0s rd=5.03ms sd=-5.03ms ipdv=0s
What is ping's average mean RTT on this link? I guess in Windows you'd see "<1ms" but Linux or OS/X would be needed for anything more precise.
I'm not sure if I can improve on this. Either I can find something else with the Windows timer APIs, or I'll have to change the display to something similar as Windows ping does (<1ms). I won't be able to take a look until next week, but meanwhile any tips are appreciated...
Ok, I'm optimistic that I can replace the monotonic clock values with calls to QueryPerformanceCounter / QueryPerformanceFrequency from kernel32.dll. In case you get a chance to try go run main.go
for the attached, you should see output similar to this (this is from my VM):
Z:\Downloads>go run main.go
QueryPerformanceFrequency: 10000000
QueryPerformanceCounter: 290156891124 (700ns elapsed)
QueryPerformanceCounter: 290156897245 (500ns elapsed)
QueryPerformanceCounter: 290156903288 (600ns elapsed)
QueryPerformanceCounter: 290156913864 (600ns elapsed)
QueryPerformanceCounter: 290156919117 (400ns elapsed)
QueryPerformanceCounter: 290156923965 (500ns elapsed)
QueryPerformanceCounter: 290156928897 (600ns elapsed)
QueryPerformanceCounter: 290156933810 (500ns elapsed)
QueryPerformanceCounter: 290156998684 (1.1µs elapsed)
QueryPerformanceCounter: 290157007633 (600ns elapsed)
Calls: 1000000
Zeroes: 0
Min nonzero: 200ns
Max: 384.7µs
Mean: 265ns
I'm hoping not to see any zeroes, and for the mean elapsed time to be in the sub-microsecond range. It will be more expensive calling into the dll like this, but looks like it will be far more accurate, if it tests out ok when calling from different goroutines, etc.
Hi Pete, I'll give it a spin today and let you know. Thanks!
Thanks, and an update is attached so run this one instead. It looks like Go's standard time implementation isn't precise enough for the wall clock either, so I'll have to use GetSystemTimePreciseAsFileTime instead, which is now included in the test, along with a test of Go's standard time, which seems to quantize to 1ms.
Z:\Downloads>go run main.go
Standard Go Wall Time:
Time: 1528311859729190500 (0s elapsed)
Time: 1528311859730189900 (999.4µs elapsed)
Time: 1528311859731189100 (999.2µs elapsed)
Time: 1528311859731189100 (0s elapsed)
Time: 1528311859732188800 (999.7µs elapsed)
Iterations: 2000000
Zeroes: 1999905
Min nonzero: 992.3µs
Max: 3.0101ms
Mean: 50ns
Elapsed: 106.9972ms
QueryPerformanceCounter/Frequency:
QueryPerformanceFrequency: 10000000
QueryPerformanceCounter: 339554402978 (600ns elapsed)
QueryPerformanceCounter: 339554407755 (477.7µs elapsed)
QueryPerformanceCounter: 339554412550 (479.5µs elapsed)
QueryPerformanceCounter: 339554417207 (465.7µs elapsed)
QueryPerformanceCounter: 339554433863 (1.6656ms elapsed)
Iterations: 2000000
Zeroes: 0
Min nonzero: 200ns
Max: 1.8156ms
Mean: 280ns
Elapsed: 563.9931ms
GetSystemTimePreciseAsFileTime:
SystemTimePrecise: 1528311860403546400 (400ns elapsed)
SystemTimePrecise: 1528311860403993800 (447.4µs elapsed)
SystemTimePrecise: 1528311860404456200 (462.4µs elapsed)
SystemTimePrecise: 1528311860404916300 (460.1µs elapsed)
SystemTimePrecise: 1528311860405355300 (439µs elapsed)
Iterations: 2000000
Zeroes: 0
Min nonzero: 100ns
Max: 6.5248ms
Mean: 241ns
Elapsed: 485.9941ms
Please give irtt a try on Windows again in your environment as I just integrated support for QueryPerformanceCounter and GetSystemTimePreciseAsFileTime. I'll be glad to hear if it works on a low RTT network with 64-bit Windows, but the results are looking much better for be now (see below). Thanks!
Z:\Downloads>irtt client -i 10ms -d 1s -q localhost
[Connecting] connecting to localhost
[127.0.0.1:2112] [Connected] connection established
Min Mean Median Max Stddev
--- ---- ------ --- ------
RTT 84µs 138µs 116µs 223µs 46.6µs
send delay 41.4µs 81µs 63.2µs 150µs 35.7µs
receive delay 42.1µs 56.7µs 54.9µs 95.7µs 11.7µs
IPDV (jitter) 0s 35.5µs 15.5µs 114µs 36.9µs
send IPDV 0s 27.6µs 11.9µs 97.9µs 29.4µs
receive IPDV 0s 9.72µs 7.9µs 31.9µs 8.23µs
send call time 96.8µs 158µs 256µs 52.7µs
timer error 201µs 475µs 2.09ms 266µs
server proc. time 3.9µs 7.27µs 31.9µs 3.85µs
duration: 989.9ms (wait 0s)
packets sent/received: 100/100 (0.00% loss)
server packets received: 100/100 (0.00%/0.00% loss up/down)
bytes sent/received: 6000/6000
send/receive rate: 48.5 Kbps / 48.5 Kbps
packet length: 60 bytes
timer stats: 0/100 (0.00%) missed, 4.75% error
I also tested this on a physical box with 64-bit Windows and the results look good. Note that it's possible to see a 0 IPDV value because the resolution is 100ns when it's 1ns on Linux, for example, but that's not a significant problem.
The Windows client can show round-trip times that are zero or even negative, regardless of whether the wall or monotonic clock values are used (tested on VMWare Win 10 32-bit, Go 1.10.2).