Open yaroz opened 6 months ago
I've thought about this and might add it to a future version. My main concern was if you're running a speedtest on an interval all day every day, it would use a non-trivial amount of bandwidth, so if this is deployed on a metered connection it would take up data. Maybe an optional feature you can turn on
I agree. I learned that the hard way when I had a pi doing a speedtest every 30 minutes on a metered connection. We were 30GB over.. wasn't fun explaining that to management.
I don't know much about grafana or any of the systems that are being used, but what about adding something where you can run a manual speed test or something? Not something that gets run automatically just something that every now and then when you look at the graphs you can click run.
I'll think about it some more for an upcoming release
Part of the issue as well is establishing a reliable external download source to test, without having to deploy my own speedtest server and pay for it. But I'm sure something can be done using free files available
I'll think about it some more for an upcoming release
Part of the issue as well is establishing a reliable external download source to test, without having to deploy my own speedtest server and pay for it. But I'm sure something can be done using free files available
Maybe you can look at this? https://github.com/maxandersen/internet-monitoring
I'll think about it some more for an upcoming release Part of the issue as well is establishing a reliable external download source to test, without having to deploy my own speedtest server and pay for it. But I'm sure something can be done using free files available
Maybe you can look at this? https://github.com/maxandersen/internet-monitoring
or maybe this one: https://www.speedtest.net/apps/cli
I like the idea of having the ability to also get the speedtest data. I think having it as another metric would be of value especially when paying for synchronous connectivity. Another project that I have been using for speedtest data on its own is Speedtest Tracker (https://hub.docker.com/r/henrywhitaker3/speedtest-tracker) and have the test running every 5 minutes...since I do not have a data cap or limit, I am not as concerned with that aspect but want to have consistency in what I am paying for...
I currently have ATT Fiber and it seems like they do fine for a while then something happens and getting the data in a central dashboard would be great to provide to them. This is what I have been able to tack with your tool for the last 24 hours...and truly appreciate the work...very easy to spin up in a Proxmox VM...
So @philipnordmann is playing with this right now in case anyone wants to test:
https://github.com/philipnordmann/netprobe_lite/tree/speed_test
I need to spend some time this weekend testing before incorporating
I've been partial to https://speed.cloudflare.com/ as of late and have found reported speeds to be more reliable/closer to the "real feel" of my internet performance at a given point in time... if it's an option that is (I know speedtest is/has been the staple for some time)
Pulled this in today for testing and some tweaks:
https://github.com/plaintextpackets/netprobe_lite/tree/speedtest
Note: the docker image in Dockerhub is not yet updated. If you want to test the speedtest then build the docker image yourself and modify the compose file to point to the image you built
There's an issue with this approach as if the speedtest runs longer than 30s it will stop all other metric collection for that interval, so you get gaps. Need to asynchronously do speedtests and other probes. Will think on it.
That would for sure work to make the speedtest in the background but if it takes more than 30s then we probably see (as you already said) some spikes in latency, jitter, etc. Another option from my perspective would be to just send out the "old" values from the other metrics when the speedtest is not finished yet, that won't represent the full picture either but at least the gaps will be gone.
I can have a look the next week to check. Unfortunately my speedtest never took more then 20s I think so that's why I never noticed, sorry!
I could also have a look into @willstocks suggestion but from what I can see the python library is not really offical and a reverse engineering of couldflares speedtest. But they have the option to define the amount of data that is transfered which might help a bit.
So I was working on this yesterday and I'll post a branch soon. Basically I split off the Speedtest into its own container to allow for asynchronous writes to redis and to ensure the two tests don't impact one another.
It works but I am seeing much slower speeds reported when running inside the container vs running the same code on the host. I am troubleshooting some more tonight.
Ok so I'm consistently hitting an issue with the speedtest-cli module where it it's only reporting my download as like 2.5 Mbps. On inspection of the PCAP, for some reason the application when run within Docker abruptly terminates the download test early, resulting in skewed results. This happens when running on various base images, but when running it from the host in linux it works just fine.
I am troubleshooting to see if I can figure it out but at this point I may pivot to another module or possibly write something myself
From inside docker:
Retrieving speedtest.net server list... Selecting best server based on ping... Testing download speed................................................................................ Download: 2.13 Mbit/s Testing upload speed...................................................................................................... Upload: 2.59 Mbit/s root@be01c0b63d02:/#
From the host:
Retrieving speedtest.net server list... Selecting best server based on ping... Testing download speed................................................................................ Download: 59.77 Mbit/s Testing upload speed...................................................................................................... Upload: 11.72 Mbit/s chocolate@chocolate:/scripts/speedtest$
Ok figured it out, this is actually the Pihole issue
On the system which I'm testing on, I am running both Pihole as well as this test for netprobe. When I run the speedtest module and capture the traffic I was seeing DNS timeouts like this:
You'll notice the app queries the primary DNS (10.0.10.150 - also my docker host), then gets ICMP unreachable messages, then 5s later it queries again to the secondary DNS IP (10.0.10.1) and immediately gets a response. I noticed these DNS delays throughout the capture. When I manually run the container setting it to Google DNS, it works great:
chocolate@chocolate:/scripts/speedtest$ docker run -it --dns 8.8.8.8 -v .:/code speedtest:latest /bin/bash
Retrieving speedtest.net server list... Selecting best server based on ping... Testing download speed................................................................................ Download: 58.84 Mbit/s Testing upload speed...................................................................................................... Upload: 12.43 Mbit/s root@8b2e7f017d71:/#
My thought is that when querying my own host IP for DNS I run into the weird routing issue we experience in issue #33 (https://github.com/plaintextpackets/netprobe_lite/issues/33)
Solution to this would be to run the speedtest container with Google DNS manually set to avoid issues
Got it working :)
Anyone up for testing this branch? https://github.com/plaintextpackets/netprobe_lite/tree/speedtest
nice work I'll report back
That works as advertised so far, we could add some ENVs for expected speeds maybe. I'll let that run for a bit. Love your work!
The speed limitation is the network this is running on.
Awesome work @plaintextpackets! Everything seems to work as expected. I am seeing a huge packet loss and latency from amazon which significantly reduces my average score. Trying to ping amazon from the terminal, it looks like this is really the case and not caused by the updated code. So, the app serves its purpose :) Will continue to monitor though.
@nzsambo thank you for testing! Yeah I left the thresholds out since people have different speeds but can put that in as an optional, only issue is that you'd have to modify the grafana dash template and not the .env
@suzunov Glad it caught the amazon issue, you're welcome to change the target to another website which is reliable in your area
Nice thing @plaintextpackets, thanks! Will also let it run now for a few days but looks good so far on my side as well
v1.4.0 is now released which contains the speedtest!
Please note it is disabled by default, and that you will need to remove your old containers and docker volumes to upgrade (docker compose down -v). This will unfortunately result in losing your old data.
So running the the latest release now..seems good, but do you know if there are any bandwidth issues when the container is in a VM? I have a multigig synchronous fiber link but seems it is tapping out at 1Gbps... I can start a pcap to see if there is anything at the host or VM level but did not know if you had any thoughts before doing that...
@secdoc what kind of VM? Might be some kind of virtual limit
@secdoc what kind of VM? Might be some kind of virtual limit
It is a Proxmox VM running on debian 12.
The previous post referencing the another speedtest docker app, is on the same proxmox server but different VMs.
Here is an updated Netprobe 12-Hr view
I will be doing some pcaps this weekend to determine if there is a bottleneck within my network or VM and then post any findings...
I realize if I'm going to test this at full speed I need faster internet LOL :(
I realize if I'm going to test this at full speed I need faster internet LOL :(
Understand...For reference, both VMs (192.168.2.185 and 192.168.2.10 respectively) are running on the same Proxmox Host that is connected to an aggregation switch via 2 x 10Gbps that are configured for link aggregation with a 10Gbps uplink to the router with the 2.5Gbps Fiber WAN to the ISP. The Proxmox host has the following specs:
CPU(s): 96 x AMD EPYC 7642 48-Core Processor (1 Socket)
Kernel Version: Linux 6.8.4-3-pve (2024-05-02T11:55Z)
The Netprobe Host has the following Specs:
OS: Debian GNU/Linux 12 (bookworm) x86_64
Host: KVM/QEMU (Standard PC (i440FX + PIIX, 1996) pc-i440fx-8.1)
Kernel: 6.1.0-20-amd64
CPU: AMD EPYC 7642 (2) @ 2.299GHz
Memory: 643MiB / 3915MiB
Disk (/): 2.1G / 6.2G (35%)
The Speedtest Tracker Host has the following Specs:
OS: Ubuntu 20.04.6 LTS x86_64
Host: KVM/QEMU (Standard PC (Q35 + ICH9, 2009) pc-q35-8.1)
Kernel: 5.4.0-182-generic
CPU: AMD EPYC 7642 (6) @ 2.299GHz
Memory: 1284MiB / 7946MiB
Disk (/): 13G / 15G (87%)
With this in mind, I have not seen anything that is potentially obvious from the PCAPS but here is data so far from tcpdump of the host...
I filtered all my SSH and al subsequent DNS/ARP/ICMP and other UDP traffic from the capture...
for reference this was my wireshark filter of the tcpdump capture (!(udp) && !(icmp) && !(arp) && !(ssh) && !(tcp.port==22 || tcp.port==3001))
I let the tcpdump run for roughly 20+ minutes. When you look at the conversation, most of the traffic from a packet and data volume perspective is linked to the IP 161.188.169.250
and it is not resolving to a particular host or domain when Name resolution is enabled, but after doing a whois
you get the following:
# start
NetRange: 161.188.0.0 - 161.188.199.255
CIDR: 161.188.192.0/21, 161.188.128.0/18, 161.188.0.0/17
NetName: AT-88-Z
NetHandle: NET-161-188-0-0-1
Parent: NET161 (NET-161-0-0-0-0)
NetType: Direct Allocation
OriginAS:
Organization: Amazon Technologies Inc. (AT-88-Z)
RegDate: 2020-12-29
Updated: 2024-05-31
Ref: https://rdap.arin.net/registry/ip/161.188.0.0
OrgName: Amazon Technologies Inc.
OrgId: AT-88-Z
Address: 410 Terry Ave N.
City: Seattle
StateProv: WA
PostalCode: 98109
Country: US
RegDate: 2011-12-08
Updated: 2024-01-24
Comment: All abuse reports MUST include:
Comment: * src IP
Comment: * dest IP (your IP)
Comment: * dest port
Comment: * Accurate date/timestamp and timezone of activity
Comment: * Intensity/frequency (short log extracts)
Comment: * Your contact details (phone and email) Without these we will be unable to identify the correct owner of the IP address at that point in time.
Ref: https://rdap.arin.net/registry/entity/AT-88-Z
OrgRoutingHandle: IPROU3-ARIN
OrgRoutingName: IP Routing
OrgRoutingPhone: +1-206-555-0000
OrgRoutingEmail: aws-routing-poc@amazon.com
OrgRoutingRef: https://rdap.arin.net/registry/entity/IPROU3-ARIN
OrgAbuseHandle: AEA8-ARIN
OrgAbuseName: Amazon EC2 Abuse
OrgAbusePhone: +1-206-555-0000
OrgAbuseEmail: abuse@amazonaws.com
OrgAbuseRef: https://rdap.arin.net/registry/entity/AEA8-ARIN
OrgRoutingHandle: ARMP-ARIN
OrgRoutingName: AWS RPKI Management POC
OrgRoutingPhone: +1-206-555-0000
OrgRoutingEmail: aws-rpki-routing-poc@amazon.com
OrgRoutingRef: https://rdap.arin.net/registry/entity/ARMP-ARIN
OrgTechHandle: ANO24-ARIN
OrgTechName: Amazon EC2 Network Operations
OrgTechPhone: +1-206-555-0000
OrgTechEmail: amzn-noc-contact@amazon.com
OrgTechRef: https://rdap.arin.net/registry/entity/ANO24-ARIN
OrgNOCHandle: AANO1-ARIN
OrgNOCName: Amazon AWS Network Operations
OrgNOCPhone: +1-206-555-0000
OrgNOCEmail: amzn-noc-contact@amazon.com
OrgNOCRef: https://rdap.arin.net/registry/entity/AANO1-ARIN
# end
# start
NetRange: 161.188.128.0 - 161.188.191.255
CIDR: 161.188.128.0/18
NetName: AWS-DISH
NetHandle: NET-161-188-128-0-1
Parent: AT-88-Z (NET-161-188-0-0-1)
NetType: Reallocated
OriginAS:
Organization: DISH Wireless L.L.C. (DWL-61)
RegDate: 2024-01-09
Updated: 2024-01-09
Ref: https://rdap.arin.net/registry/ip/161.188.128.0
OrgName: DISH Wireless L.L.C.
OrgId: DWL-61
Address: 5701 S Santa Fe Drive
City: Littleton
StateProv: CO
PostalCode: 80120
Country: US
RegDate: 2020-04-08
Updated: 2023-09-28
Ref: https://rdap.arin.net/registry/entity/DWL-61
OrgTechHandle: SMITH6080-ARIN
OrgTechName: Smith, Brian
OrgTechPhone: +1-480-558-2496
OrgTechEmail: brianc.smith@dish.com
OrgTechRef: https://rdap.arin.net/registry/entity/SMITH6080-ARIN
OrgTechHandle: JEMO-ARIN
OrgTechName: Marquez Osuna, Jorge Edmundo
OrgTechPhone: +1-303-706-5175
OrgTechEmail: jorge1.marquez@dish.com
OrgTechRef: https://rdap.arin.net/registry/entity/JEMO-ARIN
OrgNOCHandle: DWN4-ARIN
OrgNOCName: Dish Wireless NOC
OrgNOCPhone: +1-833-347-4602
OrgNOCEmail: dishwirelessnoc@dish.com
OrgNOCRef: https://rdap.arin.net/registry/entity/DWN4-ARIN
OrgAbuseHandle: DWN4-ARIN
OrgAbuseName: Dish Wireless NOC
OrgAbusePhone: +1-833-347-4602
OrgAbuseEmail: dishwirelessnoc@dish.com
OrgAbuseRef: https://rdap.arin.net/registry/entity/DWN4-ARIN
# end
I would attach the PCAP, but compressed it is at a little over 340MB.
So I am not sure if this is a speedtest server that is being pinned for the connection or if this is something else...
I also did a speedtest and capture from my other system that is running the Speedtest Tracker container filtering out all traffic but that for the speedtests (that particular system is running multiple docker containers) using the following Wireshark filter !(udp) && !(icmp) && !(arp) && !(ssh) && !(tcp.port==22 || tcp.port==443 || tcp.port==80 || tcp.port==1514 || tcp.port==46548 || tcp.port==30018 || tcp.port==30021 || tcp.port==30006 || tcp.port==30002)
. After filtering the packets and resaving the pcap, this file is still over 2GB in size for roughly a 1-2 min capture
I get completely different speedtest servers but the IP 161.188.169.250
is seen in the mix and as a resolved host:
But the set of speedtests run on this system, the primary IP 23.128.56.22
was being pinned
That IP is associated with starlight fiber.
Great info here.
Any chance to have a speed test added into the graph as well?
External IP addresses would be nice too, to know if the router failed over for a period of time. Maybe show the current IP, and a link to the history? You can get that from ifconfig.me/ip