Single number summaries can be misleading

dtaht commented 2 years ago

I am (happily!) testing the new upload-only test and patch on my starlink terminal. And thank you for that, sorry to be a pest. It appears to report an average...

time ./ndt7-client --download=false
upload in progress with ndt-mlab3-lax05.mlab-oti.measurement-lab.org
Avg. speed  :     2.1 Mbit/s
upload: complete

Test results

    Server: ndt-mlab3-lax05.mlab-oti.measurement-lab.org
    Client: 98.97.58.24

              Upload
     Throughput:     2.1 Mbit/s
        Latency:    38.5 ms

Over the course of this test run, however, latency varied by quite a lot.

-- ndt-mlab1-lax05.mlab-oti.measurement-lab.org ping statistics ---
19 packets transmitted, 19 received, 0% packet loss, time 18020ms
rtt min/avg/max/mdev = 26.910/96.673/162.339/40.269 ms

As you might imagine this much jitter is pure hell on many interactive applications. I am under the impression that the backend sampled TCP_INFO once per 10ms?

https://github.com/m-lab/ndt7-client-go/issues/81

robertodauria commented 2 years ago

I am (happily!) testing the new upload-only test and patch on my starlink terminal. And thank you for that, sorry to be a pest. It appears to report an average...

Latency is MinRTT from the last server-side measurement, It should be the lowest RTT recorded during the whole test.

The tcpinfo daemon which runs alongside the server on the M-Lab platform reads and saves TCP_INFO for each connection every 10ms, but this fine-grained data is only available later in BigQuery. ndt-server itself only takes a sample every ~250ms and sends it to the client via JSON messages during the test.

Running the client with -format=json will show more information, including all the messages received from the server.

Which TCP_INFO field (or other information) would you like to see in the summary?

cc @mattmathis

dtaht commented 2 years ago

I think matt is at burning man this week. :)

Most of our other tools like ping report min/mean/max.

I am primarily interested in 98th percentile latency and above, 99th, and 99.9.

For inspiration, see: https://www.ookla.com/articles/introducing-loaded-latency

I would prefer the tool report 99.9th percentile (this will discard a few outliers like an arp looku) for the "latency" figure, or report min/mean/98/99/max.

dtaht commented 2 years ago

There's a LOT of fields in TCP_INFO! I want all of em! all of em! for the whole test, every 10ms... Can I dream that big? :)

robertodauria commented 2 years ago

Most of our other tools like ping report min/mean/max.

I can definitely see the value in that. We could do it, the problem is the polling frequency. I looked at how Ookla reports loaded latency, and they seem to update the UI every 0.5s or so and show the instantaneous value (does not look like a running average, there were some huge swings).

ndt-server sends a measurement message that includes TCPInfo every 250ms. If the client just shows TCPInfo.RTT when it receives it from the server, I think we get something similar to Ookla's loaded latency, but a 4Hz polling rate still doesn't sound good enough to give meaningful mean/max/percentiles at the end.

We can experiment a bit with adding TCPInfo polling to the client (which would be a linux-only feature) every 10ms, but I suspect a 100Hz polling rate would not come cheap in terms of processing power -- this code must be able to run on a variety of devices, including embedded devices with (very) limited CPUs (routers, Raspberry Pi's, etc). I may be wrong about the performance impact, though. Or, we could make it configurable with a safe default. It's not trivial to detect when the client becomes the bottleneck, so we try to make clients as "dumb" and optimized as possible.

dtaht commented 2 years ago

Can you just send a dump of TCP_INFO from the server at the end of the test? This is what we do with "irtt", but for udp measurement flows.

In my own tool (flent.org) we can sample these stats every 50ms. There's a few dozen plot types and 110 tests. Have you ever played with that? Example:

T=starlink
i=1
S1=fremont.starlink.taht.net
flent --socket-stats -x --step-size=.05 -t $T-$i -H $S1 --test-param=upload_streams=$i tcp_nup

tcp_nup_-_starlink-1

two items of note - folk game the fixed length speedtest.net style tests a LOT. 20seconds then they chop your bandwidth. Not much ndt can do about that given that humans can't stand waiting that long (though the plausible threat of a researcher doing that might help with getting un-gamed)

Starlink's long term behaviors are "interesting". Every 15s they re-optimize the network, which you can see above. However if it comes up from an idle period, it can take those 15 sec to get more bandwidth... or at least used to! Starlink are doing MUCH better lately.

m-lab / ndt7-client-go

Single number summaries can be misleading #83