heistp / irtt

Isochronous Round-Trip Tester
GNU General Public License v2.0
185 stars 23 forks source link

"ShortReply" error when running on Windows #13

Closed sgissi closed 6 years ago

sgissi commented 6 years ago

Not sure Windows is a supported platform or not but compiles without issues. When running I always get from the client:

Terminated due to receive error: [ShortReply] received short reply (44 bytes)

Server doesn't have much information other than "NewConn" and "CloseConn" logs. Enabling HMAC increases the packet size in the error from 44 to 60 bytes but still has the same error.

Packet capture shows (bytes refers to UDP payload not the entire packet): Client->Server: 25 bytes Server->Client: 33 bytes Client->Server: 60 bytes Server->Client: 44 bytes Client->Server: 12 bytes

heistp commented 6 years ago

Thanks for the feedback! I'll try to support it, but I also saw strange problems on 32-bit Win 10 (only Windows I currently have, on VMWare) when I was testing during development that appeared to be issues with Go on Windows. A few questions:

sgissi commented 6 years ago

I compiled on Win10 64-bits using Go 1.9.4, will try with go 1.10. Adding "--tstamp=midpoint" does work as intended.

heistp commented 6 years ago

Ok, that sounds like the same problem I saw before on 1.9. I'll also try it with 1.10 on 32-bit and if it still reproduces I'll try to work around it somehow. First, a Win 10 "feature update" to finish...

heistp commented 6 years ago

The good news is that with Go 1.10.2, I no longer see the strange issue I was seeing in 1.9.

The issue you were seeing was actually because I was making an assumption that the system timer is accurate enough to measure the server's processing time. For Windows, it appears not to be. At least irtt handles that case now.

Additionally the one-way receive delay when client and server are on the same Windows machine seems to often be zero (but not always). This appears to also be due to Windows' timer and process scheduling and probably not something I can do anything about, but at least it should still be accurate enough for millisecond scale RTTs, hopefully.

I also see a new issue on Windows where the first second of round-trip times seems to always be zero, so I'll file an issue for myself and see if I can figure that one out.

Let me know if you see any other problems.

sgissi commented 6 years ago

Indeed, just upgraded to 1.10.2 and it works as expected. My first roundtrip is not zero:

[Connecting] connecting to localhost [127.0.0.1:2112] [Connected] connection established seq=0 rtt=1.01ms rd=0s sd=1.01ms ipdv=n/a

I haven't looked at the code too much yet but if there is any low hanging fruit that you need help with, please let me know.

heistp commented 6 years ago

Ok, would you mind trying the client more times in a row? I actually found that the first RTT is zero only about 50% of the time I run it. The fact that I'm running on VMWare could affect this.

sgissi commented 6 years ago

Testing locally gives me zero in a few of the attempts, not necessarily the first ones:

seq=0 rtt=502µs rd=0s sd=502µs ipdv=n/a seq=1 rtt=2.01ms rd=0s sd=2.01ms ipdv=1.5ms seq=2 rtt=1.97ms rd=0s sd=1.97ms ipdv=37µs seq=3 rtt=0s rd=0s sd=0s ipdv=1.97ms seq=4 rtt=0s rd=0s sd=0s ipdv=0s seq=5 rtt=2.01ms rd=0s sd=2.01ms ipdv=2.01ms seq=6 rtt=0s rd=0s sd=0s ipdv=2.01ms seq=7 rtt=2ms rd=1ms sd=1ms ipdv=2ms

The two VMs have zeroes all over, they are in different datacenters (should have at minimum 0.5 ms latency one-way).

seq=0 rtt=0s rd=-26.97ms sd=26.97ms ipdv=n/a seq=1 rtt=0s rd=1.24ms sd=-1.24ms ipdv=0s seq=2 rtt=0s rd=14.85ms sd=-14.85ms ipdv=0s seq=3 rtt=0s rd=-1.73ms sd=1.73ms ipdv=0s seq=4 rtt=0s rd=-2.72ms sd=2.72ms ipdv=0s seq=5 rtt=0s rd=-3.71ms sd=3.71ms ipdv=0s seq=6 rtt=0s rd=-5.7ms sd=5.7ms ipdv=0s seq=7 rtt=0s rd=-6.68ms sd=6.68ms ipdv=0s seq=8 rtt=0s rd=-7.67ms sd=7.67ms ipdv=0s seq=9 rtt=0s rd=1.94ms sd=-1.94ms ipdv=0s seq=10 rtt=0s rd=-4.04ms sd=4.04ms ipdv=0s seq=11 rtt=0s rd=-1.43ms sd=1.43ms ipdv=0s seq=12 rtt=0s rd=-2.42ms sd=2.42ms ipdv=0s seq=13 rtt=0s rd=-5.41ms sd=5.41ms ipdv=0s seq=14 rtt=0s rd=-6.39ms sd=6.39ms ipdv=0s seq=15 rtt=0s rd=-7.38ms sd=7.38ms ipdv=0s seq=16 rtt=0s rd=-9.37ms sd=9.37ms ipdv=0s seq=17 rtt=0s rd=-13.35ms sd=13.35ms ipdv=0s seq=18 rtt=0s rd=259µs sd=-259µs ipdv=0s seq=19 rtt=0s rd=-728µs sd=728µs ipdv=0s seq=20 rtt=0s rd=-2.72ms sd=2.72ms ipdv=0s seq=21 rtt=0s rd=-4.7ms sd=4.7ms ipdv=0s interrupt

                   Min     Mean   Median      Max  Stddev
                   ---     ----   ------      ---  ------
         RTT        0s       0s       0s       0s      0s
  send delay  -14.85ms   4.31ms   3.88ms  26.97ms   7.4ms

receive delay -26.97ms -4.31ms -3.88ms 14.85ms 7.4ms

IPDV (jitter) 0s 0s 0s 0s 0s send IPDV 1ms 3.4ms 2ms 13.6ms 3.98ms receive IPDV 1ms 3.4ms 2ms 13.6ms 3.98ms

send call time 0s 0s 0s 0s timer error 196µs 7.63ms 24.42ms 5.66ms

            duration: 21s (wait 0s)

packets sent/received: 22/22 (0.00% loss) server packets received: 22/22 (0.00%/0.00% loss up/down) bytes sent/received: 1320/1320 send/receive rate: 502 bps / 502 bps packet length: 60 bytes timer stats: 0/22 (0.00%) missed, 0.76% error

On Tue, May 29, 2018 at 1:37 PM Pete Heist notifications@github.com wrote:

Ok, would you mind trying the client more times in a row? I actually found that the first RTT is zero only about 50% of the time I run it. The fact that I'm running on VMWare could affect this.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/peteheist/irtt/issues/13#issuecomment-392934982, or mute the thread https://github.com/notifications/unsubscribe-auth/ACeHXz8nY5f6X9qYz9q6zgApkBsbl3-nks5t3bGFgaJpZM4UR3F8 .

heistp commented 6 years ago

Ouch, ok, I can also reproduce this especially if I lower the interval:

irtt client -i 10ms localhost

I can even see slightly negative RTTs. Thanks for testing. We can continue this over here.

heistp commented 6 years ago

Hi Silvio, would you mind trying again on Windows with the latest code and reporting any results in issue #14?

The negative RTTs were solved and I suppose it should help with artificially low times on a LAN as well (can't subtract the server processing time in Windows because its timer isn't accurate enough). Although what you're seeing in your data center looks different, where send delay and receive delay are equal but opposite, and all RTTs of zero. I haven't seen that.

I still see RTT jumps on VMWare, but those go away on a physical Windows box at the office. Zero RTTs are also still possible when below the precision of the Windows timer, but I haven't decided how to handle that yet.

A re-test on your LAN would be good, and if you want to try a server on the Internet:

irtt client --hmac=flentusers irtt.eventide.io

Thanks...