Closed lberkheiser closed 10 months ago
test_mode = "multi"
Per our own docs, this reaches at to mutliple servers so it is not a 1-1 for one comparison.
I don't find any logs in /var/log/telegraf
If you are running as a service, then look at journalctl --no-pager --unit telegraf
. If you can't find them, then run this locally via the CLI and provide the results.
It is very unlikely that Telegraf itself can do anything here. More importantly given the information provided nothing points to anything wrong in Telegraf itself. Even with the logs, my suggestion is going to be to reach out to the upstream library, file an issue, and see what they say or have you do.
As mentioned, I tried also with test_mode = "single", with similar results. Updated telegraf.conf extract:
[[inputs.internet_speed]]
interval = "15m"
test_mode = "single"
server_id_include = ["42082"]
I will open an issue on speedtest-go library as well, but I don't think it is the only thing to blame here. I downloaded the latest release of speedtest-go (version 1.6.9) and ran it from WSL. The result is pretty close to the official speedtest CLI by Ookla, and much higher than result obtained when run from Telegraf. Download speeds: Speedtest CLI: 893 Mbps Speedtest-go: 918 Mbps Speedtest from Telegraf: 400 Mbps
See below screenshots:
I enabled debug in the telegraf.conf file, and I see these logs in Docker:
2023-12-05 10:14:58 2023-12-05T09:14:58Z D! [outputs.influxdb_v2] Wrote batch of 28 metrics in 33.4753ms
2023-12-05 10:14:58 2023-12-05T09:14:58Z D! [outputs.influxdb_v2] Buffer fullness: 0 / 10000 metrics
2023-12-05 10:15:00 2023-12-05T09:15:00Z E! [inputs.exec] Error in plugin: exec: fork/exec /usr/bin/speedtest: no such file or directory for command "/usr/bin/speedtest -f json-pretty --accept-license":
2023-12-05 10:15:00 2023-12-05T09:15:00Z D! [inputs.system] Reading users: open /var/run/utmp: no such file or directory
2023-12-05 10:15:04 2023-12-05T09:15:04Z D! [inputs.internet_speed] using server 42082 in Geneva (speedtest2.infomaniak.com.prod.hosts.ooklaserver.net:8080)
2023-12-05 10:15:08 2023-12-05T09:15:08Z D! [outputs.influxdb_v2] Wrote batch of 28 metrics in 51.4985ms
2023-12-05 10:15:08 2023-12-05T09:15:08Z D! [outputs.influxdb_v2] Buffer fullness: 0 / 10000 metrics
2023-12-05 10:15:10 2023-12-05T09:15:10Z D! [inputs.system] Reading users: open /var/run/utmp: no such file or directory
2023-12-05 10:15:18 2023-12-05T09:15:18Z D! [outputs.influxdb_v2] Wrote batch of 28 metrics in 31.9457ms
2023-12-05 10:15:18 2023-12-05T09:15:18Z D! [outputs.influxdb_v2] Buffer fullness: 0 / 10000 metrics
It confirms it's using the same server-id (42082) I am using in CLI.
What other plugins do you have running? Your logs show you attempting to run speedtest via exec, which could run tests in parallel, hence why your speed is half of what it expects.
To be blunt, performance "issues" are a pain to track down without exact steps to reproduce or even access to the environment. I would also expect many other reports of an issue with the speed test if this was a regression.
Locally my results continue to be what I expect:
# multi
internet_speed,server_id=1442,source=idf-speedtest.syringanetworks.net.prod.hosts.ooklaserver.net:8080,test_mode=multi download=677.4897,upload=34.68,latency=32.968142,jitter=1.939252,location="Idaho Falls, ID" 1701786927000000000
# single
internet_speed,server_id=1442,source=idf-speedtest.syringanetworks.net.prod.hosts.ooklaserver.net:8080,test_mode=single jitter=2.663372,location="Idaho Falls, ID",download=679.8876,upload=33.6216,latency=30.901142 1701786983000000000
❯ speedtest --server-id 1442
==============================================================================
You may only use this Speedtest software and information generated
from it for personal, non-commercial use, through a command line
interface on a personal computer. Your use of this software is subject
to the End User License Agreement, Terms of Use and Privacy Policy at
these URLs:
https://www.speedtest.net/about/eula
https://www.speedtest.net/about/terms
https://www.speedtest.net/about/privacy
==============================================================================
Do you accept the license? [type YES to accept]: yes
License acceptance recorded. Continuing.
Speedtest by Ookla
Server: Syringa Networks - Idaho Falls, ID (id: 1442)
ISP: Sparklight
Idle Latency: 21.89 ms (jitter: 1.54ms, low: 19.80ms, high: 25.24ms)
Download: 689.68 Mbps (data used: 856.9 MB)
262.50 ms (jitter: 64.81ms, low: 19.92ms, high: 613.36ms)
Upload: 31.43 Mbps (data used: 22.0 MB)
26.87 ms (jitter: 12.38ms, low: 15.18ms, high: 178.15ms)
Packet Loss: 0.0%
I only have one output (influxdb_v2), and I only have the following inputs configured:
This is very little in my opinion.
2023-12-05T12:48:12Z I! Loaded inputs: cpu disk diskio exec internet_speed kernel mem ping processes snmp swap syslog system
I've tried commenting all the config for the snmp and exec inputs. I still get the same results, 389 Mbps download when I should be getting about 900 Mbps.
PS: the exec input had configuration for a first attempt to configure speed test.
If you run natively on Windows and remove docker and it's networking from the equation what do you get?
Ran a test on Windows natively and on WSL (Ubuntu), and it seems good. I'll set it up with the Windows service to see if it's stable. Weird that Docker is causing such a big drop.
Weird that Docker is causing such a big drop.
That is quite surprising that it is such a large impact. Glad you go it working without docker!
Relevant telegraf.conf
Logs from Telegraf
System info
Docker 4.24.2 on Windows
Docker
No response
Steps to reproduce
Expected behavior
Speed test results should be similar between Telegraf, WSL and Windows, and should be closer to reality.
Actual behavior
Speed test results from Telegraf give much lower results, especially for download. Download speed is about 400 Mbps from Telegraf tests, versus about 900 Mbps from WSL or Windows. Upload speed is about 700 Mbps from Telegraf, versus about 900 Mbps from WSL or Windows. From Telegraf: From WSL: From Windows:
Additional info
Similar experience to issue #11449. I have a Windows machine with WSL and Docker Desktop (v4.24.2). Three docker containers are running, Telegraf, InfluxDB and Grafana. The results from Telegraf's "internet_speed" input plugin show a much slower download (and upload) speed than when running the speedtest CLI from either WSL or Windows natively. I get approximately 900 Mbps download and upload when running from WSL or Windows, whereas I only get about 400 Mbps download and 700 Mbps upload for tests done by Telegraf.
I've tried setting different intervals (1h or 15m), I've tried changing the test_mode (from single to multi), I've tried to use server_id_include to use only a single server ID or a few server IDs. No change. Attached is my latest config.