m-lab / murakami

Run automated internet measurement tests in a Docker container.
Apache License 2.0
41 stars 11 forks source link

ndt5 custom runner not working #108

Closed 11gingerbread closed 2 years ago

11gingerbread commented 2 years ago

Hello,

I am running Customer NDT Server in my environment which is running for both ndt5 and ndt7. I am able to test the same with the Web URL for ndt5 http://MY_CUSTOM_NDT_SERVER:3010/widget.html

I am using this ndt client on the system where I want to measure speed and it is working for ndt7 but not for ndt5. I have separate config files for both ndt7 and ndt5 i.e ndt5-custom-config.json and ndt7-custom-config.json

Below is what I have for my tests in "murakami.toml"

[tests] [tests.ndt5custom] enabled = true config = "/murakami/configs/ndt5-custom-config.json"

[tests.ndt7custom] enabled = true config = "/murakami/configs/ndt7-custom-config.json"

I am getting the data from ndt7 but not in ndt5 custom runner.

NDT5 test output - {"TestName": "ndt5", "TestStartTime": "2022-04-12T18:38:48.809943", "TestEndTime": "2022-04-12T18:39:33.862906", "MurakamiLocation": "Ohio", "MurakamiConnectionType": "wired", "MurakamiNetworkType": "Corp", "MurakamiDeviceID": "", "ServerName": null, "ServerIP": null, "ClientIP": null, "DownloadUUID": null}

but ndt7 is giving me the expected output. I am able to telnet to the Custom ndt server on Port 3010 and 8080 from the server running ndt client.

Below is the log from ndt client "measurementlab/murakami) docker container execution -

2022-04-12 19:06:34,632 server.py:195 INFO Starting Murakami services. 2022-04-12 19:06:34,632 server.py:127 DEBUG Loading test runner dash 2022-04-12 19:06:34,939 server.py:127 DEBUG Loading test runner ndt5 2022-04-12 19:06:34,948 server.py:127 DEBUG Loading test runner ndt5custom 2022-04-12 19:06:34,975 server.py:127 DEBUG Loading test runner ndt7 2022-04-12 19:06:34,984 server.py:127 DEBUG Loading test runner ndt7custom 2022-04-12 19:06:35,010 server.py:127 DEBUG Loading test runner ooniprobe 2022-04-12 19:06:35,020 server.py:127 DEBUG Loading test runner speedtestmulti 2022-04-12 19:06:35,029 server.py:127 DEBUG Loading test runner speedtestsingle 2022-04-12 19:06:35,038 selector_events.py:58 DEBUG Using selector: EpollSelector 2022-04-12 19:06:35,043 base.py:445 INFO Adding job tentatively -- it will be properly scheduled when the scheduler starts 2022-04-12 19:06:35,043 server.py:167 DEBUG Loading exporter local 2022-04-12 19:06:35,053 local.py:29 DEBUG {'type': 'local', 'enabled': True, 'path': '/data/'} 2022-04-12 19:06:35,053 server.py:200 INFO Starting the job scheduler. 2022-04-12 19:06:35,053 server.py:41 DEBUG Not previously fired before 2022-04-12 19:06:35,054 base.py:886 INFO Added job "Test Runners" to job store "default" 2022-04-12 19:06:35,054 base.py:171 INFO Scheduler started 2022-04-12 19:06:35,054 base.py:945 DEBUG Looking for jobs to run 2022-04-12 19:06:35,054 base.py:123 INFO Running job "Test Runners (trigger: <murakami.server.RandomTrigger object at 0x7ff13b08ed50>, next run at: 2022-04-12 19:06:35 UTC)" (scheduled at 2022-04-12 19:06:35.053863+00:00) 2022-04-12 19:06:35,055 base.py:1025 DEBUG Next wakeup is due at 2022-04-12 19:09:08.335054+00:00 (in 153.280691 seconds) 2022-04-12 19:06:35,055 server.py:106 INFO Running test: DASH 2022-04-12 19:06:35,056 runner.py:53 INFO Test runner DASH disabled, skipping. 2022-04-12 19:06:35,056 server.py:106 INFO Running test: ndt5 2022-04-12 19:06:35,056 runner.py:53 INFO Test runner ndt5 disabled, skipping. 2022-04-12 19:06:35,056 server.py:106 INFO Running test: ndt5custom 2022-04-12 19:06:35,056 ndt5custom.py:117 INFO Starting ndt5custom test... 2022-04-12 19:06:35,057 ndt5custom.py:167 INFO Running ndt5custom measurement (server): MY_CUSTOM_NDT_SERVER:3010 2022-04-12 19:07:20,110 ndt5custom.py:68 INFO ndt5 test completed successfully. 2022-04-12 19:07:20,111 server.py:114 INFO Running exporter local for test ndt5custom 2022-04-12 19:07:20,111 local.py:38 INFO Copying data to /data/ndt5custom-Ohio-Corp-wired-0-2022-04-12T19:06:35.056641.jsonl 2022-04-12 19:07:20,112 server.py:106 INFO Running test: ndt7 2022-04-12 19:07:20,112 runner.py:53 INFO Test runner ndt7 disabled, skipping. 2022-04-12 19:07:20,112 server.py:106 INFO Running test: ndt7custom 2022-04-12 19:07:20,112 runner.py:53 INFO Test runner ndt7custom disabled, skipping. 2022-04-12 19:07:20,112 ndt7custom.py:133 INFO Starting ndt7 test... 2022-04-12 19:07:20,112 ndt7custom.py:188 INFO Running ndt7custom measurement (server): MY_CUSTOM_NDT_SERVER:8080 2022-04-12 19:07:20,112 server.py:106 INFO Running test: ooniprobe 2022-04-12 19:07:20,112 runner.py:53 INFO Test runner ooniprobe disabled, skipping. 2022-04-12 19:07:20,112 server.py:106 INFO Running test: Speedtest-cli-multi-stream 2022-04-12 19:07:20,112 runner.py:53 INFO Test runner Speedtest-cli-multi-stream disabled, skipping. 2022-04-12 19:07:20,112 server.py:106 INFO Running test: Speedtest-cli-single-stream 2022-04-12 19:07:20,112 runner.py:53 INFO Test runner Speedtest-cli-single-stream disabled, skipping. 2022-04-12 19:07:20,113 base.py:144 INFO Job "Test Runners (trigger: <murakami.server.RandomTrigger object at 0x7ff13b08ed50>, next run at: 2022-04-12 19:09:08 UTC)" executed successfully

robertodauria commented 2 years ago

@11gingerbread Could you please try again without specifying the port (:3010) in the custom config JSON? e.g.

{
    "serverGroups": [
        {
            "selection": "random",
            "servers": [
                "NDT_SERVER_HOSTNAME"
            ]
        }
    ]
}

The ndt5 client seems to have a bug leading to some strange output when the -server flag contains the port, and this output is interpreted incorrectly by Murakami:

$ ndt5-client -quiet -server=localhost:3010 -format=json
{"Key":"error","Value":"cannot receive kickoff message: read tcp 127.0.0.1:42856-\u003e127.0.0.1:3010: i/o timeout"}
11gingerbread commented 2 years ago

Thank you for looking into it.

I tried as suggested and the output file is exactly the same. Below is the docker logs -

2022-04-14 13:45:11,256 base.py:123 INFO Running job "Test Runners (trigger: <murakami.server.RandomTrigger object at 0x7fca6b782cd0>, next run at: 2022-04-14 13:45:11 UTC)" (scheduled at 2022-04-14 13:45:11.255545+00:00) 2022-04-14 13:45:11,257 base.py:1025 DEBUG Next wakeup is due at 2022-04-14 13:45:43.072037+00:00 (in 31.816012 seconds) 2022-04-14 13:45:11,257 server.py:106 INFO Running test: DASH 2022-04-14 13:45:11,257 runner.py:53 INFO Test runner DASH disabled, skipping. 2022-04-14 13:45:11,257 server.py:106 INFO Running test: ndt5 2022-04-14 13:45:11,258 runner.py:53 INFO Test runner ndt5 disabled, skipping. 2022-04-14 13:45:11,258 server.py:106 INFO Running test: ndt5custom 2022-04-14 13:45:11,258 ndt5custom.py:117 INFO Starting ndt5custom test... 2022-04-14 13:45:11,258 ndt5custom.py:167 INFO Running ndt5custom measurement (server): NDT_SERVER_HOSTNAME <----- 2022-04-14 13:45:31,880 server.py:110 ERROR Failed to run test ndt5custom: ndt5-client did not return a valid JSON summary. <----- 2022-04-14 13:45:31,880 server.py:106 INFO Running test: ndt7 2022-04-14 13:45:31,880 runner.py:53 INFO Test runner ndt7 disabled, skipping. 2022-04-14 13:45:31,880 server.py:106 INFO Running test: ndt7custom 2022-04-14 13:45:31,880 runner.py:53 INFO Test runner ndt7custom disabled, skipping. 2022-04-14 13:45:31,880 server.py:106 INFO Running test: ooniprobe 2022-04-14 13:45:31,881 runner.py:53 INFO Test runner ooniprobe disabled, skipping. 2022-04-14 13:45:31,881 server.py:106 INFO Running test: Speedtest-cli-multi-stream 2022-04-14 13:45:31,881 runner.py:53 INFO Test runner Speedtest-cli-multi-stream disabled, skipping. 2022-04-14 13:45:31,881 server.py:106 INFO Running test: Speedtest-cli-single-stream 2022-04-14 13:45:31,881 runner.py:53 INFO Test runner Speedtest-cli-single-stream disabled, skipping. 2022-04-14 13:45:31,881 base.py:144 INFO Job "Test Runners (trigger: <murakami.server.RandomTrigger object at 0x7fca6b782cd0>, next run at: 2022-04-14 13:45:43 UTC)" executed successfully

robertodauria commented 2 years ago

The ERROR Failed to run test ndt5custom: ndt5-client did not return a valid JSON summary. is surprising. Could you please try running the ndt5-client binary manually in the Docker container, so we can see what output Murakami is receiving?

You can get the container ID of the running murakami container with docker ps, then run ndt5-client inside the container with

docker exec $CONTAINER_ID ndt5-client -protocol=ndt5 -quiet -format=json -server=$NDT_SERVER_HOSTNAME

Does it output an error?

Also, please note the port the client connects to for ndt5 measurements is 3001 (corresponding to the -ndt5_addr flag of ndt-server), which must also be reachable.

11gingerbread commented 2 years ago

This did not result in error. I was able to get the output too when I removed "-quiet" so that's good.

Port 3001 is reachable from the client I am trying to connect to NDT Server.

robertodauria commented 2 years ago

@11gingerbread So it seems the client is running correctly but the output isn't a valid JSON. Unfortunately I haven't been able to replicate it here so far. Would you be willing to test a slightly modified version that adds debug information?

Set loglevel to DEBUG in murakami.toml (or the MURAKAMI_SETTINGS_LOGLEVEL environment variable, if you are using env variables to configure Murakami), and change the Docker image to measurementlab/murakami:testing:

docker run measurementlab/murakami:testing

This modified version will output the command's arguments, standard output/error and the exit code from ndt5-client, and will hopefully help us debug the issue. Thanks!

11gingerbread commented 2 years ago

Yes, absolutely. I executed it and below is the log.

2022-04-19 19:24:01,658 base.py:945 DEBUG Looking for jobs to run 2022-04-19 19:24:01,659 base.py:1025 DEBUG Next wakeup is due at 2022-04-19 19:26:45.137414+00:00 (in 163.478922 seconds) 2022-04-19 19:24:01,659 base.py:123 INFO Running job "Test Runners (trigger: <murakami.server.RandomTrigger object at 0x7fd8b07e6ed0>, next run at: 2022-04-19 19:26:45 UTC)" (scheduled at 2022-04-19 19:24:01.643587+00:00) 2022-04-19 19:24:01,660 server.py:106 INFO Running test: DASH 2022-04-19 19:24:01,660 runner.py:53 INFO Test runner DASH disabled, skipping. 2022-04-19 19:24:01,660 server.py:106 INFO Running test: ndt5 2022-04-19 19:24:01,660 runner.py:53 INFO Test runner ndt5 disabled, skipping. 2022-04-19 19:24:01,660 server.py:106 INFO Running test: ndt5custom 2022-04-19 19:24:01,660 ndt5custom.py:121 INFO Starting ndt5custom test... 2022-04-19 19:24:01,661 ndt5custom.py:161 DEBUG Servers: {'NDT_SERVER_HOSTNAME'} 2022-04-19 19:24:01,661 ndt5custom.py:172 DEBUG Arguments: ['ndt5-client', '-protocol=ndt5', '-format=json', '-quiet', '-server=NDT_SERVER_HOSTNAME'] 2022-04-19 19:24:01,661 ndt5custom.py:173 INFO Running ndt5custom measurement (server): NDT_SERVER_HOSTNAME 2022-04-19 19:24:21,820 ndt5custom.py:58 DEBUG stdout: 2022-04-19 19:24:21,820 ndt5custom.py:59 DEBUG stderr: 2022-04-19 19:24:21,821 ndt5custom.py:60 DEBUG exit code:0 2022-04-19 19:24:21,821 server.py:110 ERROR Failed to run test ndt5custom: ndt5-client did not return a valid JSON summary. 2022-04-19 19:24:21,821 base.py:131 ERROR Job "Test Runners (trigger: <murakami.server.RandomTrigger object at 0x7fd8b07e6ed0>, next run at: 2022-04-19 19:26:45 UTC)" raised an exception Traceback (most recent call last): File "/murakami/murakami/runners/ndt5custom.py", line 66, in _run_client summary = json.loads(output.stdout) File "/usr/local/lib/python3.7/json/init.py", line 348, in loads return _default_decoder.decode(s) File "/usr/local/lib/python3.7/json/decoder.py", line 337, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/usr/local/lib/python3.7/json/decoder.py", line 355, in raw_decode raise JSONDecodeError("Expecting value", s, err.value) from None json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/murakami/murakami/server.py", line 108, in _call_runners r.start_test() File "/murakami/murakami/runner.py", line 47, in start_test data = self._start_test() File "/murakami/murakami/runners/ndt5custom.py", line 174, in _start_test results.append(self._run_client(cmdargs)) File "/murakami/murakami/runners/ndt5custom.py", line 70, in _run_client 'ndt5-client did not return a valid JSON summary.') murakami.errors.RunnerError: ndt5-client did not return a valid JSON summary.

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/usr/local/lib/python3.7/site-packages/apscheduler/executors/base.py", line 125, in run_job retval = job.func(*job.args, **job.kwargs) File "/murakami/murakami/server.py", line 111, in _call_runners _logger.error(exc.with_traceback()) TypeError: with_traceback() takes exactly one argument (0 given)

robertodauria commented 2 years ago

@11gingerbread When you tested running the client manually in the container, did it return anything at all with the -quiet flag? It should provide just a single JSON with the summary, but it looks like there was no output at all when run from Murakami.

I've pushed an update to measurementlab/murakami:testing that removes -quiet. It's expected to fail after the test, but it should tell us the whole output from the ndt5-client command.

To try it (with DEBUG logging still enabled):

docker pull measurementlab/murakami:testing
docker run measurementlab/murakami:testing

Thank you for helping in debugging this.

robertodauria commented 2 years ago

Also, please see below the configuration I'm using for testing (in case you can spot any difference):

murakami.toml

[settings]
port = 80
loglevel = "DEBUG"
immediate = 1

[tests]
[tests.ndt5custom]
enabled = true
config = "custom.json"

[exporters.local]
type = "local"
enabled = true
path = "/tmp/data"

custom.json

{
    "serverGroups": [
        {
            "selection": "random",
            "servers": [
                "localhost"
            ]
        }
    ]
}

which results in:

2022-04-20 12:22:08,538 ndt5custom.py:121 INFO Starting ndt5custom test...
2022-04-20 12:22:08,539 ndt5custom.py:161 DEBUG Servers: {'localhost'}
2022-04-20 12:22:08,539 ndt5custom.py:171 DEBUG Arguments: ['ndt5-client', '-protocol=ndt5', '-format=json', '-server=localhost']
2022-04-20 12:22:08,539 ndt5custom.py:172 INFO Running ndt5custom measurement (server): localhost
2022-04-20 12:22:28,580 ndt5custom.py:58 DEBUG stdout:{"Key":"info","Value":"using localhost"}
{"Key":"info","Value":"sent login message"}

(snip)

{"Key":"info","Value":"finished successfully"}
{"ServerFQDN":"localhost","ServerIP":"127.0.0.1","ClientIP":"127.0.0.1","DownloadUUID":"asimov_1650284251_unsafe_0000000000000003","Download":{"Value":17513.85955614541,"Unit":"Mbit/s"},"Upload":{"Value":37361.035,"Unit":"Mbit/s"},"DownloadRetrans":{"Value":0.016653956744116082,"Unit":"%"},"MinRTT":{"Value":0.002,"Unit":"ms"}}

2022-04-20 12:22:28,580 ndt5custom.py:59 DEBUG stderr:
2022-04-20 12:22:28,580 ndt5custom.py:60 DEBUG exit code:0

The corresponding log from ndt-server:

2022/04/20 12:22:08 ndt5.go:124: Handling connection 127.0.0.1:3001<=PLAIN,Unknown=>127.0.0.1:52598
2022/04/20 12:22:18 c2s.go:165: Timed out
2022/04/20 12:22:18 c2s.go:100: Ended C2S test on 127.0.0.1:37105<=PLAIN,Unknown=>127.0.0.1:35830 asimov_1650284251_unsafe_0000000000000002
2022/04/20 12:22:18 c2s.go:120: 127.0.0.1:3001<=PLAIN,TLV=>127.0.0.1:52598 sent us 3.73610353216273e+07 Kbps
2022/04/20 12:22:28 s2c.go:149: We measured 1.733395599714863e+07 and the client sent us 17513887.691487
2022/04/20 12:22:28 messager.go:167: Unhandled case in SendMetrics: float64
2022/04/20 12:22:28 messager.go:167: Unhandled case in SendMetrics: float64
2022/04/20 12:22:28 messager.go:167: Unhandled case in SendMetrics: float64
2022/04/20 12:22:28 messager.go:167: Unhandled case in SendMetrics: ptr
2022/04/20 12:22:28 ndt5.go:231: You uploaded at 37361035.3216 and downloaded at 17333955.9971
robertodauria commented 2 years ago

Another idea: I was able to replicate something similar by running ndt-server with --network=bridge and exposing only the control ports (in range 3001-3010). The actual measurement, though, happens on a random port between 32768 and 65535. If these ports are unreachable, the ndt5-client outputs an error (only visible without the -quiet flag) and exits with exit code 0, without emitting any summary.

Are you by any chance running ndt-server with docker and --network=bridge? If so, could you restart the server with --network=host? I think the recommendation to run it with bridge and exposing certain ports only works with ndt7, while ndt5 requires --network=host.

11gingerbread commented 2 years ago

When you tested running the client manually in the container, did it return anything at all with the -quiet flag? - No, Didn't return any output.

Executed below with new version of "measurementlab/murakami:testing" -- -- -- docker exec 3fd9a305ee4b ndt5-client -protocol=ndt5 -format=json -server=NDT_SERVER {"Key":"info","Value":"using NDT_SERVER"} {"Key":"info","Value":"sent login message"} {"Key":"info","Value":"received the kickoff message"} {"Key":"info","Value":"cleared to run the tests"} {"Key":"info","Value":"got remote server version: v5.0-NDTinGO"} {"Key":"info","Value":"got list of test IDs: [2 4]"} {"Key":"info","Value":"running the upload test"} {"Key":"info","Value":"got TestPrepare message"} {"Key":"info","Value":"created measurement connection"} {"Key":"info","Value":"got TestStart message"} {"Key":"info","Value":"uploader goroutine forked off"} {"Key":"speed","Value":"upload: 1026.0487 Mbit/s"} {"Key":"speed","Value":"upload: 983.6243 Mbit/s"} {"Key":"speed","Value":"upload: 969.3404 Mbit/s"} {"Key":"speed","Value":"upload: 963.3781 Mbit/s"} {"Key":"speed","Value":"upload: 958.2777 Mbit/s"} {"Key":"speed","Value":"upload: 955.7979 Mbit/s"} {"Key":"speed","Value":"upload: 953.7173 Mbit/s"} {"Key":"speed","Value":"upload: 952.2509 Mbit/s"} {"Key":"speed","Value":"upload: 951.0110 Mbit/s"} {"Key":"speed","Value":"upload: 949.9688 Mbit/s"} {"Key":"speed","Value":"upload: 949.1658 Mbit/s"} {"Key":"speed","Value":"upload: 948.4877 Mbit/s"} {"Key":"speed","Value":"upload: 947.9291 Mbit/s"} {"Key":"speed","Value":"upload: 947.5487 Mbit/s"} {"Key":"speed","Value":"upload: 947.2418 Mbit/s"} {"Key":"speed","Value":"upload: 946.7913 Mbit/s"} {"Key":"speed","Value":"upload: 946.4737 Mbit/s"} {"Key":"speed","Value":"upload: 946.0896 Mbit/s"} {"Key":"speed","Value":"upload: 945.8401 Mbit/s"} {"Key":"speed","Value":"upload: 945.6446 Mbit/s"} {"Key":"speed","Value":"upload: 945.4442 Mbit/s"} {"Key":"speed","Value":"upload: 945.2644 Mbit/s"} {"Key":"speed","Value":"upload: 945.2175 Mbit/s"} {"Key":"speed","Value":"upload: 945.0365 Mbit/s"} {"Key":"speed","Value":"upload: 944.8417 Mbit/s"} {"Key":"speed","Value":"upload: 944.7951 Mbit/s"} {"Key":"speed","Value":"upload: 944.6415 Mbit/s"} {"Key":"speed","Value":"upload: 944.4945 Mbit/s"} {"Key":"speed","Value":"upload: 944.3211 Mbit/s"} {"Key":"speed","Value":"upload: 944.3105 Mbit/s"} {"Key":"speed","Value":"upload: 944.1505 Mbit/s"} {"Key":"speed","Value":"upload: 944.1584 Mbit/s"} {"Key":"speed","Value":"upload: 944.0158 Mbit/s"} {"Key":"speed","Value":"upload: 943.9208 Mbit/s"} {"Key":"speed","Value":"upload: 943.8802 Mbit/s"} {"Key":"speed","Value":"upload: 943.7854 Mbit/s"} {"Key":"speed","Value":"upload: 943.6948 Mbit/s"} {"Key":"speed","Value":"upload: 943.6319 Mbit/s"} {"Key":"speed","Value":"upload: 943.6016 Mbit/s"} {"Key":"info","Value":"uploader goroutine terminated"} {"Key":"info","Value":"server-measured speed: 931876"} {"Key":"info","Value":"test terminated"} {"Key":"info","Value":"running the download test"} {"Key":"info","Value":"got test prepare message"} {"Key":"info","Value":"created measurement connection"} {"Key":"info","Value":"got test start message"} {"Key":"info","Value":"downloader goroutine forked off"} {"Key":"speed","Value":"download: 938.1837 Mbit/s"} {"Key":"speed","Value":"download: 939.7710 Mbit/s"} {"Key":"speed","Value":"download: 940.2107 Mbit/s"} {"Key":"speed","Value":"download: 940.4489 Mbit/s"} {"Key":"speed","Value":"download: 940.5944 Mbit/s"} {"Key":"speed","Value":"download: 940.6601 Mbit/s"} {"Key":"speed","Value":"download: 940.6636 Mbit/s"} {"Key":"speed","Value":"download: 940.7126 Mbit/s"} {"Key":"speed","Value":"download: 940.7586 Mbit/s"} {"Key":"speed","Value":"download: 940.8000 Mbit/s"} {"Key":"speed","Value":"download: 940.8240 Mbit/s"} {"Key":"speed","Value":"download: 940.8512 Mbit/s"} {"Key":"speed","Value":"download: 940.8748 Mbit/s"} {"Key":"speed","Value":"download: 940.8943 Mbit/s"} {"Key":"speed","Value":"download: 940.9063 Mbit/s"} {"Key":"speed","Value":"download: 940.9186 Mbit/s"} {"Key":"speed","Value":"download: 940.9271 Mbit/s"} {"Key":"speed","Value":"download: 940.9260 Mbit/s"} {"Key":"speed","Value":"download: 940.9297 Mbit/s"} {"Key":"speed","Value":"download: 940.9323 Mbit/s"} {"Key":"speed","Value":"download: 940.9134 Mbit/s"} {"Key":"speed","Value":"download: 940.8874 Mbit/s"} {"Key":"speed","Value":"download: 940.8628 Mbit/s"} {"Key":"speed","Value":"download: 940.8432 Mbit/s"} {"Key":"speed","Value":"download: 940.8254 Mbit/s"} {"Key":"speed","Value":"download: 940.8045 Mbit/s"} {"Key":"speed","Value":"download: 940.7756 Mbit/s"} {"Key":"speed","Value":"download: 940.7586 Mbit/s"} {"Key":"speed","Value":"download: 940.7467 Mbit/s"} {"Key":"speed","Value":"download: 940.7336 Mbit/s"} {"Key":"speed","Value":"download: 940.7220 Mbit/s"} {"Key":"speed","Value":"download: 940.7097 Mbit/s"} {"Key":"speed","Value":"download: 940.7017 Mbit/s"} {"Key":"speed","Value":"download: 940.6907 Mbit/s"} {"Key":"speed","Value":"download: 940.6779 Mbit/s"} {"Key":"speed","Value":"download: 940.6692 Mbit/s"} {"Key":"speed","Value":"download: 940.6553 Mbit/s"} {"Key":"speed","Value":"download: 940.6491 Mbit/s"} {"Key":"speed","Value":"download: 940.6399 Mbit/s"} {"Key":"speed","Value":"download: 940.6331 Mbit/s"} {"Key":"info","Value":"downloader goroutine terminated"} {"Key":"info","Value":"server-measured speed: 940580 0 1175878728 kbit/s"} {"Key":"info","Value":"client-measured speed: 940642.841600 kbit/s"} {"Key":"info","Value":"web100: MaxRTT: 26"} {"Key":"info","Value":"web100: MinRTT: 3"} {"Key":"info","Value":"web100: SumRTT: 2444"} {"Key":"info","Value":"web100: CurRTO: 0"} {"Key":"info","Value":"web100: SndLimTimeCwnd: 0"} {"Key":"info","Value":"web100: SndLimTimeRwin: 0"} {"Key":"info","Value":"web100: SndLimTimeSender: 0"} {"Key":"info","Value":"web100: DataBytesOut: 0"} {"Key":"info","Value":"web100: DupAcksIn: 0"} {"Key":"info","Value":"web100: PktsOut: 823488"} {"Key":"info","Value":"web100: PktsRetrans: 0"} {"Key":"info","Value":"web100: Timeouts: 0"} {"Key":"info","Value":"web100: CountRTT: 101"} {"Key":"info","Value":"web100: CongestionSignals: 0"} {"Key":"info","Value":"web100: AckPktsIn: 0"} {"Key":"info","Value":"web100: MaxCwnd: 0"} {"Key":"info","Value":"web100: MaxRwinRcvd: 0"} {"Key":"info","Value":"web100: CurMSS: 1448"} {"Key":"info","Value":"web100: Sndbuf: 0"} {"Key":"info","Value":"web100: RcvWinScale: 0"} {"Key":"info","Value":"web100: SndWinScale: 0"} {"Key":"info","Value":"web100: TCPInfo.State: 1"} {"Key":"info","Value":"web100: TCPInfo.CAState: 0"} {"Key":"info","Value":"web100: TCPInfo.Retransmits: 0"} {"Key":"info","Value":"web100: TCPInfo.Probes: 0"} {"Key":"info","Value":"web100: TCPInfo.Backoff: 0"} {"Key":"info","Value":"web100: TCPInfo.Options: 7"} {"Key":"info","Value":"web100: TCPInfo.WScale: 119"} {"Key":"info","Value":"web100: TCPInfo.AppLimited: 0"} {"Key":"info","Value":"web100: TCPInfo.RTO: 225000"} {"Key":"info","Value":"web100: TCPInfo.ATO: 0"} {"Key":"info","Value":"web100: TCPInfo.SndMSS: 1448"} {"Key":"info","Value":"web100: TCPInfo.RcvMSS: 536"} {"Key":"info","Value":"web100: TCPInfo.Unacked: 1911"} {"Key":"info","Value":"web100: TCPInfo.Sacked: 0"} {"Key":"info","Value":"web100: TCPInfo.Lost: 0"} {"Key":"info","Value":"web100: TCPInfo.Retrans: 0"} {"Key":"info","Value":"web100: TCPInfo.Fackets: 0"} {"Key":"info","Value":"web100: TCPInfo.LastDataSent: 3"} {"Key":"info","Value":"web100: TCPInfo.LastAckSent: 0"} {"Key":"info","Value":"web100: TCPInfo.LastDataRecv: 2893497304"} {"Key":"info","Value":"web100: TCPInfo.LastAckRecv: 0"} {"Key":"info","Value":"web100: TCPInfo.PMTU: 1500"} {"Key":"info","Value":"web100: TCPInfo.RcvSsThresh: 28960"} {"Key":"info","Value":"web100: TCPInfo.RTT: 24619"} {"Key":"info","Value":"web100: TCPInfo.RTTVar: 359"} {"Key":"info","Value":"web100: TCPInfo.SndSsThresh: 16"} {"Key":"info","Value":"web100: TCPInfo.SndCwnd: 2216"} {"Key":"info","Value":"web100: TCPInfo.AdvMSS: 1448"} {"Key":"info","Value":"web100: TCPInfo.Reordering: 3"} {"Key":"info","Value":"web100: TCPInfo.RcvRTT: 0"} {"Key":"info","Value":"web100: TCPInfo.RcvSpace: 28960"} {"Key":"info","Value":"web100: TCPInfo.TotalRetrans: 0"} {"Key":"info","Value":"web100: TCPInfo.PacingRate: 260674113"} {"Key":"info","Value":"web100: TCPInfo.MaxPacingRate: -1"} {"Key":"info","Value":"web100: TCPInfo.BytesAcked: 1175878728"} {"Key":"info","Value":"web100: TCPInfo.BytesReceived: 0"} {"Key":"info","Value":"web100: TCPInfo.SegsOut: 823488"} {"Key":"info","Value":"web100: TCPInfo.SegsIn: 164322"} {"Key":"info","Value":"web100: TCPInfo.NotsentBytes: 0"} {"Key":"info","Value":"web100: TCPInfo.MinRTT: 0"} {"Key":"info","Value":"web100: TCPInfo.DataSegsIn: 0"} {"Key":"info","Value":"web100: TCPInfo.DataSegsOut: 0"} {"Key":"info","Value":"web100: TCPInfo.DeliveryRate: 0"} {"Key":"info","Value":"web100: TCPInfo.BusyTime: 0"} {"Key":"info","Value":"web100: TCPInfo.RWndLimited: 0"} {"Key":"info","Value":"web100: TCPInfo.SndBufLimited: 0"} {"Key":"info","Value":"web100: TCPInfo.Delivered: 0"} {"Key":"info","Value":"web100: TCPInfo.DeliveredCE: 0"} {"Key":"info","Value":"web100: TCPInfo.BytesSent: 0"} {"Key":"info","Value":"web100: TCPInfo.BytesRetrans: 0"} {"Key":"info","Value":"web100: TCPInfo.DSackDups: 0"} {"Key":"info","Value":"web100: TCPInfo.ReordSeen: 0"} {"Key":"info","Value":"web100: TCPInfo.RcvOooPack: 0"} {"Key":"info","Value":"web100: TCPInfo.SndWnd: 0"} {"Key":"info","Value":"web100: NDTResult.S2C.UUID: cf09041b-c0f7-11ec-ba92-52540021f59b"} {"Key":"info","Value":"web100: NDTResult.S2C.ServerIP: NDT_SERVER"} {"Key":"info","Value":"web100: NDTResult.S2C.ServerPort: 44036"} {"Key":"info","Value":"web100: NDTResult.S2C.ClientIP: NDT_CLIENT"} {"Key":"info","Value":"web100: NDTResult.S2C.ClientPort: 35692"} {"Key":"info","Value":"web100: NDTResult.S2C.StartTime: 2022-04-20 22:18:30.457297171 +0000 UTC m=+710365.396246034"} {"Key":"info","Value":"web100: NDTResult.S2C.EndTime: 2022-04-20 22:18:40.458600773 +0000 UTC m=+710375.397549599"} {"Key":"info","Value":"web100: NDTResult.S2C.MinRTT: 3ms"} {"Key":"info","Value":"web100: NDTResult.S2C.MaxRTT: 26ms"} {"Key":"info","Value":"web100: NDTResult.S2C.SumRTT: 2.444s"} {"Key":"info","Value":"web100: NDTResult.S2C.CountRTT: 101"} {"Key":"info","Value":"web100: NDTResult.S2C.Error:"} {"Key":"info","Value":"test terminated"} {"Key":"info","Value":"receiving the results"} {"Key":"info","Value":"server: You uploaded at 931876.1062 and downloaded at 940580.3716"} {"Key":"info","Value":"finished successfully"}

--------- Below is what I received in Docker logs - 2022-04-20 22:23:21,109 base.py:945 DEBUG Looking for jobs to run 2022-04-20 22:23:21,110 base.py:1025 DEBUG Next wakeup is due at 2022-04-20 22:35:43.438867+00:00 (in 742.329297 seconds) 2022-04-20 22:23:21,110 base.py:123 INFO Running job "Test Runners (trigger: <murakami.server.RandomTrigger object at 0x7f510aec18d0>, next run at: 2022-04-20 22:35:43 UTC)" (scheduled at 2022-04-20 22:23:21.008004+00:00) 2022-04-20 22:23:21,110 server.py:106 INFO Running test: DASH 2022-04-20 22:23:21,111 runner.py:53 INFO Test runner DASH disabled, skipping. 2022-04-20 22:23:21,111 server.py:106 INFO Running test: ndt5 2022-04-20 22:23:21,111 runner.py:53 INFO Test runner ndt5 disabled, skipping. 2022-04-20 22:23:21,111 server.py:106 INFO Running test: ndt5custom 2022-04-20 22:23:21,111 ndt5custom.py:121 INFO Starting ndt5custom test... 2022-04-20 22:23:21,112 ndt5custom.py:161 DEBUG Servers: {'NDT_SERVER'} 2022-04-20 22:23:21,112 ndt5custom.py:171 DEBUG Arguments: ['ndt5-client', '-protocol=ndt5', '-format=json', '-server=NDT_SERVER'] 2022-04-20 22:23:21,112 ndt5custom.py:172 INFO Running ndt5custom measurement (server): NDT_SERVER 2022-04-20 22:23:41,265 ndt5custom.py:58 DEBUG stdout:{"Key":"info","Value":"using NDT_SERVER"} {"Key":"info","Value":"sent login message"} {"Key":"info","Value":"received the kickoff message"} {"Key":"info","Value":"cleared to run the tests"}

-- snip -- same output as above docker exec execution

-- {"Key":"info","Value":"web100: NDTResult.S2C.Error:"} {"Key":"info","Value":"test terminated"} {"Key":"info","Value":"receiving the results"} {"Key":"info","Value":"server: You uploaded at 931389.5512 and downloaded at 940429.9379"} {"Key":"info","Value":"finished successfully"}

2022-04-20 22:23:41,265 ndt5custom.py:59 DEBUG stderr: 2022-04-20 22:23:41,265 ndt5custom.py:60 DEBUG exit code:0 2022-04-20 22:23:41,266 server.py:110 ERROR Failed to run test ndt5custom: ndt5-client did not return a valid JSON summary. 2022-04-20 22:23:41,266 base.py:131 ERROR Job "Test Runners (trigger: <murakami.server.RandomTrigger object at 0x7f510aec18d0>, next run at: 2022-04-20 22:35:43 UTC)" raised an exception Traceback (most recent call last): File "/murakami/murakami/runners/ndt5custom.py", line 66, in _run_client summary = json.loads(output.stdout) File "/usr/local/lib/python3.7/json/init.py", line 348, in loads return _default_decoder.decode(s) File "/usr/local/lib/python3.7/json/decoder.py", line 340, in decode raise JSONDecodeError("Extra data", s, end) json.decoder.JSONDecodeError: Extra data: line 2 column 1 (char 44)

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/murakami/murakami/server.py", line 108, in _call_runners r.start_test() File "/murakami/murakami/runner.py", line 47, in start_test data = self._start_test() File "/murakami/murakami/runners/ndt5custom.py", line 173, in _start_test results.append(self._run_client(cmdargs)) File "/murakami/murakami/runners/ndt5custom.py", line 70, in _run_client 'ndt5-client did not return a valid JSON summary.') murakami.errors.RunnerError: ndt5-client did not return a valid JSON summary.

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/usr/local/lib/python3.7/site-packages/apscheduler/executors/base.py", line 125, in run_job retval = job.func(*job.args, **job.kwargs) File "/murakami/murakami/server.py", line 111, in _call_runners _logger.error(exc.with_traceback()) TypeError: with_traceback() takes exactly one argument (0 given)

---- ---- Current json file - Matched it to yours, didn't make any difference ndt5-custom-config.json { "regions": [ "CA-ON", "CA-BC", "CA-AB" ], "countries": [ "GB", "KE", "MG" ], "serverGroups": [ { "selection": "all", "servers": [ "NDT_SERVER" ] }, { "selection": "random", "servers": [ "NDT_SERVER" ] } ] }

murakami.toml was exactly the same except "Port = 80" was commented out, I uncommented it. didn't make any difference.

For NDT Server, I am already running the container with "--network=host ". Read that on Project's GIT page that NDT5 required "host" network.

Thank you for your continued support.

robertodauria commented 2 years ago

@11gingerbread It's a bug in ndt5-client due to TCPInfo.BytesRetrans and TCPInfo.BytesSent being zero, which makes it fail to print the summary.

I opened a PR to make sure the summary is emitted anyway without the retransmission (since it cannot be computed). I'll update this thread once it's merged and I'm also going to update the Murakami container with the updated ndt5-client binary.

As to why BytesRetrans and BytesSent are zero in TCPInfo data in the first place, I don't know. NDT5 is a deprecated protocol at this point and no much development happens, so chances to get this investigated and fixed upstream are quite slim.

Users are encouraged to move to ndt7 where possible, which also has the advantage of not requiring host networking since it only uses two standard ports.

robertodauria commented 2 years ago

I have updated the ndt5 client in the latest murakami Docker image. To test it, docker pull measurementlab/murakami:latest and run it as usual. You should definitely get a valid summary in the output file, even if without Retransmission. Let me know! :)

P.S. Out of curiosity, which kernel are you running on the server? It might be helpful to determine why some TCPInfo values are zero.

11gingerbread commented 2 years ago

Roberto,

Thank you so much. Latest version of this container works as expected and I am receiving speedtest data in json format.

I am using CentOS Kernel 3.10.0-1160.42.2.el7.x86_64 and I tried the same on couple other CentOS Kernel versions, older docker image version was failing on them too.

One more question, How can I change the RTT Unit to microseconds, Default is milliseconds i.e MinRTTUnit": "ms". Apparently the value for this is "MinRTT":{"Value":0.002,"Unit":"ms"} and it is rounding it off to '0'.

When we were running ndt client with "testing" tag, we were getting all these "RTT" values in the debug output -

{"Key":"info","Value":"web100: MaxRTT: 26"} {"Key":"info","Value":"web100: MinRTT: 3"} {"Key":"info","Value":"web100: SumRTT: 2444"} {"Key":"info","Value":"web100: TCPInfo.MinRTT: 0"} {"Key":"info","Value":"web100: NDTResult.S2C.MinRTT: 3ms"} {"Key":"info","Value":"web100: NDTResult.S2C.MaxRTT: 26ms"} {"Key":"info","Value":"web100: NDTResult.S2C.SumRTT: 2.444s"} {"Key":"info","Value":"web100: NDTResult.S2C.CountRTT: 101"}

how can I get MaxRTT parameter ( or any other additional parameter) in the json output, Currently I am getting only "MinRTTValue" as below ?

The current output we are getting is - {"TestName": "ndt5", "TestStartTime": "2022-04-25T18:22:28.243813", "TestEndTime": "2022-04-25T18:22:48.398566", "MurakamiLocation": "LOCATION", "MurakamiConnectionType": "wired", "MurakamiNetworkType": "Corp", "MurakamiDeviceID": "", "ServerName": "A.B.C.D", "ServerIP": "A.B.C.D", "ClientIP": "A.B.C.D", "DownloadUUID": "afc069d7-c4c4-11ec-ba92-52540021f59b", "DownloadValue": 940.9146763315323, "DownloadUnit": "Mbit/s", "UploadValue": 931.297, "UploadUnit": "Mbit/s", "DownloadRetransValue": 0, "DownloadRetransUnit": "", "MinRTTValue": 0, "MinRTTUnit": "ms"}

robertodauria commented 2 years ago

The 3.10 kernel appears to be very old (2013?) -- I suspect that's why some data is missing from TCPInfo, including the fields the ndt5-client uses to compute the retransmission rate and MinRTT, too. You likely won't get good results with that kernel.

To clarify: all the metrics you see at the end of a test in the verbose output come from the server. If the server's kernel does not provide these metrics, they will have value "0" in the output. The TCPInfo.MinRTT value in your example isn't truncated, it just came like that from the server.

web100: MinRTT comes from a different source and I think it is provided as a fallback, but I don't know ndt-server that much in detail. (cc @stephen-soltesz in case he remembers :slightly_smiling_face:)

Murakami just reads the output of the summary that comes from from NDT5 Go client (https://github.com/m-lab/ndt5-client-go) -- if something is not available there, it won't be available in murakami.

stephen-soltesz commented 2 years ago

The web100: * values returned by the ndt-server for the ndt5 protocol are simulated measurements (See implementation here) to provide backward compatibility for earlier ndt clients. While TCPInfo reports MinRTT, there is no MaxRTT maintained by the Linux kernel.

11gingerbread commented 2 years ago

@robertodauria The kernel on the Client server and NDT server is not old. It is from Sept 2021.

rxa-docker01# rpm -qi kernel-3.10.0-1160.42.2.el7.x86_64 Name : kernel Version : 3.10.0 Release : 1160.42.2.el7 Architecture: x86_64 Install Date: Wed Sep 29 11:12:07 2021 Group : System Environment/Kernel Size : 67392115 License : GPLv2 Signature : RSA/SHA256, Wed Sep 8 10:30:24 2021, Key ID 24c6a8a7f4a80eb5 Source RPM : kernel-3.10.0-1160.42.2.el7.src.rpm Build Date : Tue Sep 7 11:07:03 2021

When we were testing with "testing" tag of the murakami client with command "-- -- -- docker exec 3fd9a305ee4b ndt5-client -protocol=ndt5 -format=json -server=NDT_SERVER", we received a number of additional measurements

Output is available @ https://github.com/m-lab/murakami/issues/108#issuecomment-1104530494

I would want someone these parameters available in the JSON Summary.

Is it possible to get that ?

@stephen-soltesz Thanks for letting me know about MaxRTT

robertodauria commented 2 years ago

@11gingerbread I can see where the confusion comes from. Sep 7 2021 is the package's build date, but Linux kernel 3.10 was released on Jun 30 2013 (https://en.wikipedia.org/wiki/Linux_kernel_version_history#Releases_3.x.y). RHEL provides security patches for a very long time, but in terms of features, that's a 9-year-old kernel.

Unfortunately, we currently don't know exactly at which point in kernel development history the TCPInfo fields that ndt-server returns all became available, since they were added at different times. The ndt7 specification mentions a minimum kernel version >= 4.19 (link), and that's a safe lower bound for NDT5 too (since the TCPInfo fields are the same for both protocols).

As per your question regarding adding TCPInfo diagnostics to Murakami's output: they would need to be added to ndt5-client's JSON summary first, then read by the Murakami ndt5 runner. Since ndt5-client's summary is only meant to include the most common metrics (the same a user can find on the official speedtest website), I would recommend adding a separate command-line flag (-detailed?) that adds TCPInfo metrics to the measurement's summary.

However, we are a small team and our capacity to work on the ndt5 client at this point is very limited since our efforts are dedicated to supporting the new ndt7 protocol and the platform as a whole. We would welcome PRs to both ndt5-client and murakami that:

If that's something you or your team are interested in working on, I'm happy to provide pointers on where exactly these changes should be made in the code.

11gingerbread commented 2 years ago

@robertodauria Richard, Sorry it took me long to get back but we were making some design decisions. We have decided to go ahead with NDT7. While using web-interface with an on-prem NDT7 Server, we get Speed, Network Latency (value missing though, attached screenshot below) and Jitter, we need these in the JSON output file when running with murakami client. Below is the JSON output from a client server that we currently get -

{"TestName": "ndt7", "TestStartTime": "2022-05-11T20:13:41.693590", "TestEndTime": "2022-05-11T20:14:01.767283", "MurakamiLocation": "XXXXX", "MurakamiConnectionType": "wired", "MurakamiNetworkType": "Corp", "MurakamiDeviceID": "", "ServerName": "A.B.C.D", "ServerIP": "XX.XX.XX.XX", "ClientIP": "YY.YY.YY.YY", "DownloadUUID": "da1149-d456-13ec-cddc-52540921d59b", "DownloadValue": 876.9174199590606, "DownloadUnit": "Mbit/s", "DownloadError": null, "UploadValue": 896.1241019107644, "UploadUnit": "Mbit/s", "UploadError": null, "DownloadRetransValue": 0, "DownloadRetransUnit": "", "MinRTTValue": 0, "MinRTTUnit": "ms"}

Also, in the web-interface, we are not getting any value for Network Latency, are we missing any configuration for that ?

NDT7 - WebInterfaceOutput

Let me know if we would need to implement a flag to receive "Latency and Jitter" in JSON output, we are ready to work on that.

P.S : Do you know if NDT Server (NDT5 or NDT7) stores these values every time a client queries it ? I was thinking if we are storing these values on the server, we can query the server directory for these values and we would need to purge the old data too.

Thank you for your time.

robertodauria commented 2 years ago

@11gingerbread You won't be able to get MinRTT unless ndt-server runs on a machine with kernel >= 4.19. That field in the tcp_info struct just wasn't provided by the kernel until then. Just by running the server on a newer kernel you should get MinRTTValue populated correctly in the JSON.

Please note that that web UI is for NDT5, not ndt7. I think you're not seeing anything after Network latency due to the same reason explained above.

Regarding your second question: yes, ndt-server saves all the collected data from each measurement in the data folder. If you run the server with -datadir <some_folder> you should see archives being created after each measurement. That applies to both NDT5 and ndt7 measurements.

e.g. after a single download+upload measurement, ndt-server running with -datadir datadir:

$ tree datadir
datadir
└── ndt7
    └── 2022
        └── 05
            └── 19
                ├── ndt7-download-20220519T135419.930065415Z.asimov_1652787550_unsafe_0000000000000001.json.gz
                └── ndt7-upload-20220519T135429.932062058Z.asimov_1652787550_unsafe_0000000000001001.json.gz

$ zcat datadir/ndt7/2022/05/19/ndt7-download-20220519T135419.930065415Z.asimov_1652787550_unsafe_0000000000000001.json.gz  | jq
{
  "GitShortCommit": "No commit specified",
  "Version": "",
  "ServerIP": "127.0.0.1",
  "ServerPort": 80,
  "ClientIP": "127.0.0.1",
  "ClientPort": 46100,
  "StartTime": "2022-05-19T13:54:09.926929479Z",
  "EndTime": "2022-05-19T13:54:19.930065143Z",
  "Download": {
[...]

You can definitely collect the results from the server side (in fact, these .json.gz are what M-Lab uses to populate the public BigQuery), but TCPInfo.MinRTT will still be missing.

robertodauria commented 2 years ago

Regarding jitter, please see https://support.measurementlab.net/help/en-us/4-ndt-network-diagnostic-tool/13-how-do-the-different-ndt-protocols-report-jitter

The web100 NDT client reported “jitter”, which was actually an estimate calculated on the client side as “MaxRTT - MinRTT”.

ndt5 does not report jitter. We removed it to avoid confusion.

ndt7 will provide the Round Trip Time Variation (RTTVar), [as defined in the Linux kernel](https://github.com/torvalds/linux/blob/master/include/uapi/linux/tcp.h#L245). This measures round trip time variation, which is frequently what people mean when they ask for jitter.

You should find RTTVar in the server-side json.gz files above. If you're willing to implement the -detailed flag on ndt7-client-go that adds TCPInfo to the summary JSON, it'll be available to the clients as well, since it's a TCPInfo field.

It might be easier to just get it from the server-side files, if that fits your use case.

11gingerbread commented 2 years ago

@robertodauria Roberto, I built a new NDT Server with Kernel 4.18 and I am now getting RTT (Latency) and RTTvar (Jitter) in the json file on the NDT Server. I will be using that data right from the central server as you suggested. It looks like we are very close to finishing line but I have few more questions though - 1) In the NDT7 data file on the NDT server for a particular client, I see 42 ServerMeasurements were attempted in "upload" file and 15 ServerMeasurements were attempted in "download" file. Are we configuring that somewhere or it is the default ? File names are "ndt7-download-20220527T201752.144681381Z.NDT_SERVER_1653057597_unsafe_0000000000001453.json.gz" and "ndt7-upload-20220527T201802.188577303Z.NDT_SERVER_1653057597_unsafe_00000000000052D9.json.gz".

Both files for a client have "RTT" and RTTvar" values in them for every test, how do we calculate final RTT and RTTvar value using data from all these attempts i.e average out ?

2) Where do we have Upload and Download speed in these files ?

3) We have 10,000 Nodes / clients which will be attempting speedtest 4 times a day. We won't obviously want them to start the test at the same time, Can we configure when a group of servers start the test ?

4) Can the Client IP Address / Hostname be configured to be part of name of the data collection file on the NDT server as will have thousands of files. Currently file name is like "ndt7-upload-20220527T190036.127352551Z.NDT_SERVER_1653057597_unsafe_00000000000001F1.json.gz"

Thank you. Have a nice weekend !!

robertodauria commented 2 years ago
1. In the NDT7 data file on the NDT server for a particular client, I see 42 ServerMeasurements were attempted in "upload" file and 15 ServerMeasurements were attempted in "download" file. Are we configuring that somewhere or it is the default ?
   File names are "ndt7-download-20220527T201752.144681381Z.NDT_SERVER_1653057597_unsafe_0000000000001453.json.gz" and "ndt7-upload-20220527T201802.188577303Z.NDT_SERVER_1653057597_unsafe_00000000000052D9.json.gz".

The server sends one measurement to the client and takes a TCPInfo snapshot at intervals following a Poisson distribution, on average every 250ms, so 42 is reasonable for a 10s + connection closing time and 15 is possible, too. This is configured in ndt-server via AveragePoissonSamplingInterval, MinPoissonSamplingInterval and MaxPoissonSamplingInterval (https://github.com/m-lab/ndt-server/blob/3fc9f3d218c98e919798f58738734eff67b4aed3/ndt7/spec/spec.go#L40) so you if you need more frequent or more regular snapshots you could tweak those values and rebuild ndt-server.

Both files for a client have "RTT" and RTTvar" values in them for every test, how do we calculate final RTT and RTTvar value using data from all these attempts i.e average out ?

As far as I know, the RTT should be continuously updated by the kernel for a given connection and the TCPInfo.RTT value should be a running average already, but this is a bit outside my domain so please take this information with a pinch of salt. The ndt7 Go client uses MinRTT from the last server-side TCPInfo snapshot after the download to show round-trip time at the end of the measurement, perhaps you could use TCPInfo.RTT from the last server-side snapshot as well?

2. Where do we have Upload and Download speed in these files ?

Upload: TCPInfo.BytesReceived / TCPInfo.ElapsedTime (* 8 in case you need bit/s) Download: TCPInfo.BytesAcked / TCPInfo.ElapsedTime (* 8 in case you need bit/s)

Both from the last ServerMeasurement snapshot.

3. We have 10,000 Nodes / clients which will be attempting speedtest 4 times a day. We won't obviously want them to start the test at the same time, Can we configure when a group of servers start the test ?

It's not currently configurable in Murakami, but they won't start at the same time since scheduling is randomized. The randomization function is random.expovariate (https://github.com/m-lab/murakami/blob/master/murakami/server.py#L37). Randomized intervals were used specifically to solve this problem of distributing the load from thousands of clients.

4. Can the Client IP Address / Hostname be configured to be part of name of the data collection file on the NDT server as  will have thousands of files. Currently file name is like "ndt7-upload-20220527T190036.127352551Z.NDT_SERVER_1653057597_unsafe_00000000000001F1.json.gz"

Not that I know (at least not without modifying ndt-server) but you can find ClientIP as a top-level field in each file. If you really need to have it in the filename, the client's IP needs to be plumbed through to this function in ndt-server and concatenated.

11gingerbread commented 2 years ago

@robertodauria Roberto, We attempted ad-hoc network test with command "docker exec CONTAINER_ID ndt5-client -protocol=ndt5 -quiet -format=json -server=DTR_SERVER" when we were trying NDT5 client. I was trying to attempt the same thing with ndt7 protocol but it is not working.

root@murakami:/murakami# /murakami/bin/ndt7-client -quiet -format=json -server=NDT_SERVER:8080 {"Key":"error","Value":{"Test":"download","Failure":"tls: first record does not look like a TLS handshake"}} {"Key":"error","Value":{"Test":"upload","Failure":"tls: first record does not look like a TLS handshake"}}

root@murakami:/murakami# /murakami/bin/ndt7-client -quiet -format=json -server=NDT_SERVER {"Key":"error","Value":{"Test":"download","Failure":"dial tcp NDT_SERVER:443: connect: no route to host"}} {"Key":"error","Value":{"Test":"upload","Failure":"dial tcp NDT_SERVER:443: connect: no route to host"}}

root@murakami:/murakami# /murakami/bin/ndt7-client -quiet -format=json -server=NDT_SERVER -no-verify {"Key":"error","Value":{"Test":"download","Failure":"dial tcp NDT_SERVER:443: connect: no route to host"}} {"Key":"error","Value":{"Test":"upload","Failure":"dial tcp NDT_SERVER:443: connect: no route to host"}}

root@murakami01:/murakami# /murakami/bin/ndt7-client -quiet -format=json -server=NDT_SERVER:8080 -no-verify {"Key":"error","Value":{"Test":"download","Failure":"tls: first record does not look like a TLS handshake"}} {"Key":"error","Value":{"Test":"upload","Failure":"tls: first record does not look like a TLS handshake"}}

Can you please advice.

robertodauria commented 2 years ago

I assume your instance of ndt-server is serving unenecrypted WebSocket (ws) on port 8080. By default ndt7-client uses encrypted websocket (wss) and thus is looking for a TLS handshake that isn't there.

You just need to add -scheme=ws to disable TLS and run unencrypted measurements. e.g.

$ ndt7-client -quiet -format=json -server=NDT_SERVER:8080 -scheme=ws

The -no-verify option is only useful if you are using TLS (-scheme=wss) and the certificate is a self-signed one, to skip validation.

11gingerbread commented 2 years ago

Thank you Roberto. This worked !!

robertodauria commented 2 years ago

Nice! :)

I'm going to close this issue as the original problem was solved. Feel free to open a new one if needed.