plaintextpackets / netprobe_lite

Simple internet performance tester written in Python
620 stars 124 forks source link

Speed test request / speed test not showing full speed #3

Open yaroz opened 4 months ago

yaroz commented 4 months ago

Great info here.

Any chance to have a speed test added into the graph as well?

External IP addresses would be nice too, to know if the router failed over for a period of time. Maybe show the current IP, and a link to the history? You can get that from ifconfig.me/ip

plaintextpackets commented 4 months ago

I've thought about this and might add it to a future version. My main concern was if you're running a speedtest on an interval all day every day, it would use a non-trivial amount of bandwidth, so if this is deployed on a metered connection it would take up data. Maybe an optional feature you can turn on

yaroz commented 4 months ago

I agree. I learned that the hard way when I had a pi doing a speedtest every 30 minutes on a metered connection. We were 30GB over.. wasn't fun explaining that to management.

Kelsch commented 4 months ago

I don't know much about grafana or any of the systems that are being used, but what about adding something where you can run a manual speed test or something? Not something that gets run automatically just something that every now and then when you look at the graphs you can click run.

plaintextpackets commented 4 months ago

I'll think about it some more for an upcoming release

Part of the issue as well is establishing a reliable external download source to test, without having to deploy my own speedtest server and pay for it. But I'm sure something can be done using free files available

ImCodingCat commented 4 months ago

I'll think about it some more for an upcoming release

Part of the issue as well is establishing a reliable external download source to test, without having to deploy my own speedtest server and pay for it. But I'm sure something can be done using free files available

Maybe you can look at this? https://github.com/maxandersen/internet-monitoring

Unclerojelio commented 4 months ago

I'll think about it some more for an upcoming release Part of the issue as well is establishing a reliable external download source to test, without having to deploy my own speedtest server and pay for it. But I'm sure something can be done using free files available

Maybe you can look at this? https://github.com/maxandersen/internet-monitoring

or this: https://github.com/sivel/speedtest-cli

suzunov commented 4 months ago

or maybe this one: https://www.speedtest.net/apps/cli

secdoc commented 4 months ago

I like the idea of having the ability to also get the speedtest data. I think having it as another metric would be of value especially when paying for synchronous connectivity. Another project that I have been using for speedtest data on its own is Speedtest Tracker (https://hub.docker.com/r/henrywhitaker3/speedtest-tracker) and have the test running every 5 minutes...since I do not have a data cap or limit, I am not as concerned with that aspect but want to have consistency in what I am paying for... 2024-05-02_20-56_2

I currently have ATT Fiber and it seems like they do fine for a while then something happens and getting the data in a central dashboard would be great to provide to them. This is what I have been able to tack with your tool for the last 24 hours...and truly appreciate the work...very easy to spin up in a Proxmox VM... 2024-05-02_21-05

plaintextpackets commented 4 months ago

So @philipnordmann is playing with this right now in case anyone wants to test:

https://github.com/philipnordmann/netprobe_lite/tree/speed_test

I need to spend some time this weekend testing before incorporating

willstocks commented 4 months ago

I've been partial to https://speed.cloudflare.com/ as of late and have found reported speeds to be more reliable/closer to the "real feel" of my internet performance at a given point in time... if it's an option that is (I know speedtest is/has been the staple for some time)

plaintextpackets commented 3 months ago

Pulled this in today for testing and some tweaks:

https://github.com/plaintextpackets/netprobe_lite/tree/speedtest

Note: the docker image in Dockerhub is not yet updated. If you want to test the speedtest then build the docker image yourself and modify the compose file to point to the image you built

plaintextpackets commented 3 months ago

There's an issue with this approach as if the speedtest runs longer than 30s it will stop all other metric collection for that interval, so you get gaps. Need to asynchronously do speedtests and other probes. Will think on it.

philipnordmann commented 3 months ago

That would for sure work to make the speedtest in the background but if it takes more than 30s then we probably see (as you already said) some spikes in latency, jitter, etc. Another option from my perspective would be to just send out the "old" values from the other metrics when the speedtest is not finished yet, that won't represent the full picture either but at least the gaps will be gone.

I can have a look the next week to check. Unfortunately my speedtest never took more then 20s I think so that's why I never noticed, sorry!

I could also have a look into @willstocks suggestion but from what I can see the python library is not really offical and a reverse engineering of couldflares speedtest. But they have the option to define the amount of data that is transfered which might help a bit.

plaintextpackets commented 3 months ago

So I was working on this yesterday and I'll post a branch soon. Basically I split off the Speedtest into its own container to allow for asynchronous writes to redis and to ensure the two tests don't impact one another.

It works but I am seeing much slower speeds reported when running inside the container vs running the same code on the host. I am troubleshooting some more tonight.

plaintextpackets commented 3 months ago

Ok so I'm consistently hitting an issue with the speedtest-cli module where it it's only reporting my download as like 2.5 Mbps. On inspection of the PCAP, for some reason the application when run within Docker abruptly terminates the download test early, resulting in skewed results. This happens when running on various base images, but when running it from the host in linux it works just fine.

I am troubleshooting to see if I can figure it out but at this point I may pivot to another module or possibly write something myself

From inside docker:

Retrieving speedtest.net server list... Selecting best server based on ping... Testing download speed................................................................................ Download: 2.13 Mbit/s Testing upload speed...................................................................................................... Upload: 2.59 Mbit/s root@be01c0b63d02:/#

From the host:

Retrieving speedtest.net server list... Selecting best server based on ping... Testing download speed................................................................................ Download: 59.77 Mbit/s Testing upload speed...................................................................................................... Upload: 11.72 Mbit/s chocolate@chocolate:/scripts/speedtest$

plaintextpackets commented 3 months ago

Ok figured it out, this is actually the Pihole issue

On the system which I'm testing on, I am running both Pihole as well as this test for netprobe. When I run the speedtest module and capture the traffic I was seeing DNS timeouts like this:

image

You'll notice the app queries the primary DNS (10.0.10.150 - also my docker host), then gets ICMP unreachable messages, then 5s later it queries again to the secondary DNS IP (10.0.10.1) and immediately gets a response. I noticed these DNS delays throughout the capture. When I manually run the container setting it to Google DNS, it works great:

chocolate@chocolate:/scripts/speedtest$ docker run -it --dns 8.8.8.8 -v .:/code speedtest:latest /bin/bash

Retrieving speedtest.net server list... Selecting best server based on ping... Testing download speed................................................................................ Download: 58.84 Mbit/s Testing upload speed...................................................................................................... Upload: 12.43 Mbit/s root@8b2e7f017d71:/#

My thought is that when querying my own host IP for DNS I run into the weird routing issue we experience in issue #33 (https://github.com/plaintextpackets/netprobe_lite/issues/33)

Solution to this would be to run the speedtest container with Google DNS manually set to avoid issues

plaintextpackets commented 3 months ago

Got it working :)

Anyone up for testing this branch? https://github.com/plaintextpackets/netprobe_lite/tree/speedtest

nzsambo commented 3 months ago

nice work I'll report back

nzsambo commented 3 months ago

That works as advertised so far, we could add some ENVs for expected speeds maybe. I'll let that run for a bit. Love your work!

The speed limitation is the network this is running on.

image
suzunov commented 3 months ago

Awesome work @plaintextpackets! Everything seems to work as expected. I am seeing a huge packet loss and latency from amazon which significantly reduces my average score. Trying to ping amazon from the terminal, it looks like this is really the case and not caused by the updated code. So, the app serves its purpose :) Will continue to monitor though.

plaintextpackets commented 3 months ago

@nzsambo thank you for testing! Yeah I left the thresholds out since people have different speeds but can put that in as an optional, only issue is that you'd have to modify the grafana dash template and not the .env

@suzunov Glad it caught the amazon issue, you're welcome to change the target to another website which is reliable in your area

philipnordmann commented 3 months ago

Nice thing @plaintextpackets, thanks! Will also let it run now for a few days but looks good so far on my side as well

image

plaintextpackets commented 3 months ago

v1.4.0 is now released which contains the speedtest!

Please note it is disabled by default, and that you will need to remove your old containers and docker volumes to upgrade (docker compose down -v). This will unfortunately result in losing your old data.

secdoc commented 3 months ago

So running the the latest release now..seems good, but do you know if there are any bandwidth issues when the container is in a VM? I have a multigig synchronous fiber link but seems it is tapping out at 1Gbps... I can start a pcap to see if there is anything at the host or VM level but did not know if you had any thoughts before doing that...

image

plaintextpackets commented 3 months ago

@secdoc what kind of VM? Might be some kind of virtual limit

secdoc commented 3 months ago

@secdoc what kind of VM? Might be some kind of virtual limit

It is a Proxmox VM running on debian 12.

image

The previous post referencing the another speedtest docker app, is on the same proxmox server but different VMs.

Here is an updated Netprobe 12-Hr view

image

secdoc commented 3 months ago

I will be doing some pcaps this weekend to determine if there is a bottleneck within my network or VM and then post any findings...

plaintextpackets commented 3 months ago

I realize if I'm going to test this at full speed I need faster internet LOL :(

secdoc commented 3 months ago

I realize if I'm going to test this at full speed I need faster internet LOL :(

Understand...For reference, both VMs (192.168.2.185 and 192.168.2.10 respectively) are running on the same Proxmox Host that is connected to an aggregation switch via 2 x 10Gbps that are configured for link aggregation with a 10Gbps uplink to the router with the 2.5Gbps Fiber WAN to the ISP. The Proxmox host has the following specs:

CPU(s): 96 x AMD EPYC 7642 48-Core Processor (1 Socket)
Kernel Version: Linux 6.8.4-3-pve (2024-05-02T11:55Z)

The Netprobe Host has the following Specs:

OS: Debian GNU/Linux 12 (bookworm) x86_64 
Host: KVM/QEMU (Standard PC (i440FX + PIIX, 1996) pc-i440fx-8.1) 
Kernel: 6.1.0-20-amd64 
CPU: AMD EPYC 7642 (2) @ 2.299GHz 
Memory: 643MiB / 3915MiB 
Disk (/): 2.1G / 6.2G (35%) 

The Speedtest Tracker Host has the following Specs:

OS: Ubuntu 20.04.6 LTS x86_64 
Host: KVM/QEMU (Standard PC (Q35 + ICH9, 2009) pc-q35-8.1) 
Kernel: 5.4.0-182-generic 
CPU: AMD EPYC 7642 (6) @ 2.299GHz 
Memory: 1284MiB / 7946MiB 
Disk (/): 13G / 15G (87%) 

With this in mind, I have not seen anything that is potentially obvious from the PCAPS but here is data so far from tcpdump of the host... 2024-06-02_11-52

2024-06-02_12-52

I filtered all my SSH and al subsequent DNS/ARP/ICMP and other UDP traffic from the capture...

for reference this was my wireshark filter of the tcpdump capture (!(udp) && !(icmp) && !(arp) && !(ssh) && !(tcp.port==22 || tcp.port==3001))

I let the tcpdump run for roughly 20+ minutes. When you look at the conversation, most of the traffic from a packet and data volume perspective is linked to the IP 161.188.169.250 and it is not resolving to a particular host or domain when Name resolution is enabled, but after doing a whois you get the following:

# start

NetRange:       161.188.0.0 - 161.188.199.255
CIDR:           161.188.192.0/21, 161.188.128.0/18, 161.188.0.0/17
NetName:        AT-88-Z
NetHandle:      NET-161-188-0-0-1
Parent:         NET161 (NET-161-0-0-0-0)
NetType:        Direct Allocation
OriginAS:       
Organization:   Amazon Technologies Inc. (AT-88-Z)
RegDate:        2020-12-29
Updated:        2024-05-31
Ref:            https://rdap.arin.net/registry/ip/161.188.0.0

OrgName:        Amazon Technologies Inc.
OrgId:          AT-88-Z
Address:        410 Terry Ave N.
City:           Seattle
StateProv:      WA
PostalCode:     98109
Country:        US
RegDate:        2011-12-08
Updated:        2024-01-24
Comment:        All abuse reports MUST include:
Comment:        * src IP
Comment:        * dest IP (your IP)
Comment:        * dest port
Comment:        * Accurate date/timestamp and timezone of activity
Comment:        * Intensity/frequency (short log extracts)
Comment:        * Your contact details (phone and email) Without these we will be unable to identify the correct owner of the IP address at that point in time.
Ref:            https://rdap.arin.net/registry/entity/AT-88-Z

OrgRoutingHandle: IPROU3-ARIN
OrgRoutingName:   IP Routing
OrgRoutingPhone:  +1-206-555-0000 
OrgRoutingEmail:  aws-routing-poc@amazon.com
OrgRoutingRef:    https://rdap.arin.net/registry/entity/IPROU3-ARIN

OrgAbuseHandle: AEA8-ARIN
OrgAbuseName:   Amazon EC2 Abuse
OrgAbusePhone:  +1-206-555-0000 
OrgAbuseEmail:  abuse@amazonaws.com
OrgAbuseRef:    https://rdap.arin.net/registry/entity/AEA8-ARIN

OrgRoutingHandle: ARMP-ARIN
OrgRoutingName:   AWS RPKI Management POC
OrgRoutingPhone:  +1-206-555-0000 
OrgRoutingEmail:  aws-rpki-routing-poc@amazon.com
OrgRoutingRef:    https://rdap.arin.net/registry/entity/ARMP-ARIN

OrgTechHandle: ANO24-ARIN
OrgTechName:   Amazon EC2 Network Operations
OrgTechPhone:  +1-206-555-0000 
OrgTechEmail:  amzn-noc-contact@amazon.com
OrgTechRef:    https://rdap.arin.net/registry/entity/ANO24-ARIN

OrgNOCHandle: AANO1-ARIN
OrgNOCName:   Amazon AWS Network Operations
OrgNOCPhone:  +1-206-555-0000 
OrgNOCEmail:  amzn-noc-contact@amazon.com
OrgNOCRef:    https://rdap.arin.net/registry/entity/AANO1-ARIN

# end

# start

NetRange:       161.188.128.0 - 161.188.191.255
CIDR:           161.188.128.0/18
NetName:        AWS-DISH
NetHandle:      NET-161-188-128-0-1
Parent:         AT-88-Z (NET-161-188-0-0-1)
NetType:        Reallocated
OriginAS:       
Organization:   DISH Wireless L.L.C. (DWL-61)
RegDate:        2024-01-09
Updated:        2024-01-09
Ref:            https://rdap.arin.net/registry/ip/161.188.128.0

OrgName:        DISH Wireless L.L.C.
OrgId:          DWL-61
Address:        5701 S Santa Fe Drive
City:           Littleton
StateProv:      CO
PostalCode:     80120
Country:        US
RegDate:        2020-04-08
Updated:        2023-09-28
Ref:            https://rdap.arin.net/registry/entity/DWL-61

OrgTechHandle: SMITH6080-ARIN
OrgTechName:   Smith, Brian 
OrgTechPhone:  +1-480-558-2496 
OrgTechEmail:  brianc.smith@dish.com
OrgTechRef:    https://rdap.arin.net/registry/entity/SMITH6080-ARIN

OrgTechHandle: JEMO-ARIN
OrgTechName:   Marquez Osuna, Jorge Edmundo
OrgTechPhone:  +1-303-706-5175 
OrgTechEmail:  jorge1.marquez@dish.com
OrgTechRef:    https://rdap.arin.net/registry/entity/JEMO-ARIN

OrgNOCHandle: DWN4-ARIN
OrgNOCName:   Dish Wireless NOC
OrgNOCPhone:  +1-833-347-4602 
OrgNOCEmail:  dishwirelessnoc@dish.com
OrgNOCRef:    https://rdap.arin.net/registry/entity/DWN4-ARIN

OrgAbuseHandle: DWN4-ARIN
OrgAbuseName:   Dish Wireless NOC
OrgAbusePhone:  +1-833-347-4602 
OrgAbuseEmail:  dishwirelessnoc@dish.com
OrgAbuseRef:    https://rdap.arin.net/registry/entity/DWN4-ARIN

# end

I would attach the PCAP, but compressed it is at a little over 340MB.

So I am not sure if this is a speedtest server that is being pinned for the connection or if this is something else...

I also did a speedtest and capture from my other system that is running the Speedtest Tracker container filtering out all traffic but that for the speedtests (that particular system is running multiple docker containers) using the following Wireshark filter !(udp) && !(icmp) && !(arp) && !(ssh) && !(tcp.port==22 || tcp.port==443 || tcp.port==80 || tcp.port==1514 || tcp.port==46548 || tcp.port==30018 || tcp.port==30021 || tcp.port==30006 || tcp.port==30002). After filtering the packets and resaving the pcap, this file is still over 2GB in size for roughly a 1-2 min capture

2024-06-02_12-51

I get completely different speedtest servers but the IP 161.188.169.250 is seen in the mix and as a resolved host: 2024-06-02_12-36

But the set of speedtests run on this system, the primary IP 23.128.56.22 was being pinned 2024-06-02_12-50 That IP is associated with starlight fiber.