influxdata / telegraf

Agent for collecting, processing, aggregating, and writing metrics, logs, and other arbitrary data.
https://influxdata.com/telegraf
MIT License
14.63k stars 5.58k forks source link

internet_speed download results very low #14381

Closed lberkheiser closed 10 months ago

lberkheiser commented 11 months ago

Relevant telegraf.conf

[[inputs.internet_speed]]
 interval = "15m"
 test_mode = "multi"
 server_id_include = ["3228", "42082", "46512", "59072", "59081"]

Logs from Telegraf

N/A
I don't find any logs in /var/log/telegraf

System info

Docker 4.24.2 on Windows

Docker

No response

Steps to reproduce

  1. Configure Telegraf "internet_speed" Input Plugin in telegraf.conf
  2. Restart docker container "telegraf"
  3. Monitor test results in Grafana
  4. Test speedtest in CLI from WSL: speedtest --server-id=42082
  5. Test speedtest in CLI from Windows natively: speedtest.exe --server-id=42082
  6. Compare test results

Expected behavior

Speed test results should be similar between Telegraf, WSL and Windows, and should be closer to reality.

Actual behavior

Speed test results from Telegraf give much lower results, especially for download. Download speed is about 400 Mbps from Telegraf tests, versus about 900 Mbps from WSL or Windows. Upload speed is about 700 Mbps from Telegraf, versus about 900 Mbps from WSL or Windows. From Telegraf: 2023-12-04 speedtest_results_telegraf From WSL: 2023-12-04 speedtest_results_wsl From Windows: 2023-12-04 speedtest_results_windows

Additional info

Similar experience to issue #11449. I have a Windows machine with WSL and Docker Desktop (v4.24.2). Three docker containers are running, Telegraf, InfluxDB and Grafana. The results from Telegraf's "internet_speed" input plugin show a much slower download (and upload) speed than when running the speedtest CLI from either WSL or Windows natively. I get approximately 900 Mbps download and upload when running from WSL or Windows, whereas I only get about 400 Mbps download and 700 Mbps upload for tests done by Telegraf.

I've tried setting different intervals (1h or 15m), I've tried changing the test_mode (from single to multi), I've tried to use server_id_include to use only a single server ID or a few server IDs. No change. Attached is my latest config.

powersj commented 11 months ago

test_mode = "multi"

Per our own docs, this reaches at to mutliple servers so it is not a 1-1 for one comparison.

I don't find any logs in /var/log/telegraf

If you are running as a service, then look at journalctl --no-pager --unit telegraf. If you can't find them, then run this locally via the CLI and provide the results.

It is very unlikely that Telegraf itself can do anything here. More importantly given the information provided nothing points to anything wrong in Telegraf itself. Even with the logs, my suggestion is going to be to reach out to the upstream library, file an issue, and see what they say or have you do.

lberkheiser commented 11 months ago

As mentioned, I tried also with test_mode = "single", with similar results. Updated telegraf.conf extract:

[[inputs.internet_speed]]
 interval = "15m"
 test_mode = "single"
 server_id_include = ["42082"]

I will open an issue on speedtest-go library as well, but I don't think it is the only thing to blame here. I downloaded the latest release of speedtest-go (version 1.6.9) and ran it from WSL. The result is pretty close to the official speedtest CLI by Ookla, and much higher than result obtained when run from Telegraf. Download speeds: Speedtest CLI: 893 Mbps Speedtest-go: 918 Mbps Speedtest from Telegraf: 400 Mbps

See below screenshots: 2023-12-05 speedtest_results_cli 2023-12-05 speedtest_results_telegraf

lberkheiser commented 11 months ago

I enabled debug in the telegraf.conf file, and I see these logs in Docker:

2023-12-05 10:14:58 2023-12-05T09:14:58Z D! [outputs.influxdb_v2] Wrote batch of 28 metrics in 33.4753ms
2023-12-05 10:14:58 2023-12-05T09:14:58Z D! [outputs.influxdb_v2] Buffer fullness: 0 / 10000 metrics
2023-12-05 10:15:00 2023-12-05T09:15:00Z E! [inputs.exec] Error in plugin: exec: fork/exec /usr/bin/speedtest: no such file or directory for command "/usr/bin/speedtest -f json-pretty --accept-license": 
2023-12-05 10:15:00 2023-12-05T09:15:00Z D! [inputs.system] Reading users: open /var/run/utmp: no such file or directory
2023-12-05 10:15:04 2023-12-05T09:15:04Z D! [inputs.internet_speed] using server 42082 in Geneva (speedtest2.infomaniak.com.prod.hosts.ooklaserver.net:8080)
2023-12-05 10:15:08 2023-12-05T09:15:08Z D! [outputs.influxdb_v2] Wrote batch of 28 metrics in 51.4985ms
2023-12-05 10:15:08 2023-12-05T09:15:08Z D! [outputs.influxdb_v2] Buffer fullness: 0 / 10000 metrics
2023-12-05 10:15:10 2023-12-05T09:15:10Z D! [inputs.system] Reading users: open /var/run/utmp: no such file or directory
2023-12-05 10:15:18 2023-12-05T09:15:18Z D! [outputs.influxdb_v2] Wrote batch of 28 metrics in 31.9457ms
2023-12-05 10:15:18 2023-12-05T09:15:18Z D! [outputs.influxdb_v2] Buffer fullness: 0 / 10000 metrics

It confirms it's using the same server-id (42082) I am using in CLI.

powersj commented 11 months ago

What other plugins do you have running? Your logs show you attempting to run speedtest via exec, which could run tests in parallel, hence why your speed is half of what it expects.

To be blunt, performance "issues" are a pain to track down without exact steps to reproduce or even access to the environment. I would also expect many other reports of an issue with the speed test if this was a regression.

Locally my results continue to be what I expect:

# multi
internet_speed,server_id=1442,source=idf-speedtest.syringanetworks.net.prod.hosts.ooklaserver.net:8080,test_mode=multi download=677.4897,upload=34.68,latency=32.968142,jitter=1.939252,location="Idaho Falls, ID" 1701786927000000000
# single
internet_speed,server_id=1442,source=idf-speedtest.syringanetworks.net.prod.hosts.ooklaserver.net:8080,test_mode=single jitter=2.663372,location="Idaho Falls, ID",download=679.8876,upload=33.6216,latency=30.901142 1701786983000000000
❯ speedtest --server-id 1442
==============================================================================

You may only use this Speedtest software and information generated
from it for personal, non-commercial use, through a command line
interface on a personal computer. Your use of this software is subject
to the End User License Agreement, Terms of Use and Privacy Policy at
these URLs:

    https://www.speedtest.net/about/eula
    https://www.speedtest.net/about/terms
    https://www.speedtest.net/about/privacy

==============================================================================

Do you accept the license? [type YES to accept]: yes
License acceptance recorded. Continuing.

   Speedtest by Ookla

      Server: Syringa Networks - Idaho Falls, ID (id: 1442)
         ISP: Sparklight
Idle Latency:    21.89 ms   (jitter: 1.54ms, low: 19.80ms, high: 25.24ms)
    Download:   689.68 Mbps (data used: 856.9 MB)                                                   
                262.50 ms   (jitter: 64.81ms, low: 19.92ms, high: 613.36ms)
      Upload:    31.43 Mbps (data used: 22.0 MB)                                                   
                 26.87 ms   (jitter: 12.38ms, low: 15.18ms, high: 178.15ms)
 Packet Loss:     0.0%
lberkheiser commented 11 months ago

I only have one output (influxdb_v2), and I only have the following inputs configured:

This is very little in my opinion.

2023-12-05T12:48:12Z I! Loaded inputs: cpu disk diskio exec internet_speed kernel mem ping processes snmp swap syslog system

I've tried commenting all the config for the snmp and exec inputs. I still get the same results, 389 Mbps download when I should be getting about 900 Mbps.

PS: the exec input had configuration for a first attempt to configure speed test.

powersj commented 11 months ago

If you run natively on Windows and remove docker and it's networking from the equation what do you get?

lberkheiser commented 11 months ago

Ran a test on Windows natively and on WSL (Ubuntu), and it seems good. I'll set it up with the Windows service to see if it's stable. Weird that Docker is causing such a big drop.

powersj commented 10 months ago

Weird that Docker is causing such a big drop.

That is quite surprising that it is such a large impact. Glad you go it working without docker!