speedtest-netperf not working with 'default setup'

mtrtm commented 4 years ago

Maintainer: @guidosarducci Environment: (R7800 19.07.3 )

Description: I have OpenWrt 19.07.3 running with SQM cake/layer cake running.

Using the readme here: https://github.com/openwrt/packages/tree/master/net/speedtest-netperf/files it looks like I should be able to run speedtest from the CLI, but I always get errors:

root@OpenWrt:~# speedtest-netperf.sh
2020-09-28 18:17:38 Starting speedtest for 60 seconds per transfer session.
Measure speed to netperf.bufferbloat.net (IPv4) while pinging gstatic.com.
Download and upload sessions are sequential, each with 5 simultaneous streams.
.
WARNING: netperf returned errors. Results may be inaccurate!

 Download:   0.00 Mbps
  Latency: [in msec, 1 pings, 0.00% packet loss]
      Min:  12.728
    10pct:   0.000
   Median:   0.000
      Avg:  12.728
    90pct:   0.000
      Max:  12.728
 CPU Load: [in % busy (avg +/- std dev), 0 samples]
 Overhead: [in % used of total CPU available]
  netperf:   0.0
.
WARNING: netperf returned errors. Results may be inaccurate!

   Upload:   0.00 Mbps
  Latency: [in msec, 1 pings, 0.00% packet loss]
      Min:  11.754
    10pct:   0.000
   Median:   0.000
      Avg:  11.754
    90pct:   0.000
      Max:  11.754
 CPU Load: [in % busy (avg +/- std dev), 0 samples]
 Overhead: [in % used of total CPU available]
  netperf:   0.0

root@OpenWrt:~# speedtest-netperf.sh --concurrent
2020-09-28 18:17:52 Starting speedtest for 60 seconds per transfer session.
Measure speed to netperf.bufferbloat.net (IPv4) while pinging gstatic.com.
Download and upload sessions are concurrent, each with 5 simultaneous streams.
...................................................................................................................................
WARNING: netperf returned errors. Results may be inaccurate!

 Download:   0.00 Mbps
   Upload:   0.00 Mbps
  Latency: [in msec, 131 pings, 0.00% packet loss]
      Min:   9.440
    10pct:   9.747
   Median:  10.353
      Avg:  10.622
    90pct:  11.828
      Max:  13.539
 CPU Load: [in % busy (avg +/- std dev) @ avg frequency, 126 samples]
     cpu0:   5.2 +/-  2.0  @  894 MHz
     cpu1:   2.4 +/-  1.4  @  571 MHz
 Overhead: [in % used of total CPU available]
  netperf:   1.7

guidosarducci commented 4 years ago

I just noticed that as well. The problem isn't with the script but rather the upstream netperf service being unavailable.

The most likely explanation (which has happened a few times in the past) is that some users have been abusing the netperf servers to do automated, high-frequency speed tests for "tuning" their personal networks. The upstream hosted servers have an allotted monthly bandwidth which can quickly become exhausted, and result in disabling the netperf services. Bandwidth isn't free so this is understandable.

@richb-hanover Hi Rich! I noticed a while back on the bloat mailling lists you were doing some investigations into the bandwidth usage and reworking some of the iptables policies related to netperf. How did that go? Any interesting findings about who or how bandwidth is being used? Insights into how to manage things more fairly?

If this is still an ongoing struggle and I can help with looking into it or updating netfilter rules, please let me know.

I also noticed bufferbloat.net seems to be using Cloudflare DNS, so it might be helpful to use some of their geo-enabled services to better load-balance among the netperf servers (at least east/west US). I've also been looking at some updates for speedtest-netperf, including dropping the default test time to 30 seconds (usually more than sufficient), which should help lower bandwidth usage.

richb-hanover commented 4 years ago

@guidosarducci Thanks for this note. I have been up to my elbows in other projects, so have not had time to pay attention to the netperf server. I just turned the netperf server on netperf.bufferbloat.net back on. Let's see how quickly the fuse blows (that is, how quickly it runs over the 4TB/month limit...)

Thanks, too, for the offer to help. I have a few thoughts:

I am not certain that there's a way to tweak the iptables rules to detect/distinguish between these two cases:
- Abuse: people running a five-stream bufferbloat test every five minutes 24x7 (that's 5 x 12 = 60 netperf sessions/hour, but sustained over many hours/days)
- Legitimate tuning or research: Running a bufferbloat test (say, betterspeedtest.sh) multiple times in a row to optimize your router. It, too, might be five or ten (both download and upload) streams x a dozen tests over the course of 15-20 minutes (again 60 to 120 netperf sessions/hour)
In any event, you can see the iptables rules that I currently employ, and the tools that I use to scan log files to find candidates to blacklist: https://github.com/richb-hanover/netperfclean
I like your thought about using Cloudflare DNS to distribute load. I will speak to a couple people to see if there might be some resources available to take a share of the load.

Thanks again.

richb-hanover commented 4 years ago

Two more comments:

You can see the traffic on the main netperf server at http://netperf.bufferbloat.net (All data is from when I turned on the server a couple hours ago.)
I wrote up with a possible strategy for blocking abusers at: https://github.com/richb-hanover/netperfclean/issues/1#issue-711949399

richb-hanover commented 4 years ago

Update: I implemented part of the strategy for blocking abusers. The netperf server at netperf.bufferbloat.net is back on the air - please let me know if you see problems. Thanks.

sashasimkin commented 3 years ago

The netperf.bufferbloat.net is returning 503 and doesn't seem to work right now. I'm getting the same output as in the first post.

I'm gonna go setting up private server for the tests, but maybe it makes sense to add a note and link the server project in the readme so that people who stumble upon unavailability have pointers on how to proceed.

Also, thank you for the work being done here! I'm trying to catch a nasty issue with wireguard performance on my installation, and hope this project helps me determine the bottleneck.

guidosarducci commented 3 years ago

@sashasimkin I'm guessing the abuse may have picked up again, even after the last server changes Rich put in. Updating the README with some troubleshooting steps is probably a good idea too.

@richb-hanover Hi Rich, I have some time and will revisit your updates. I did have some ideas as well but it seemed your changes had the problematic users well in hand. One issue I did have was never being able to see the traffic logs you posted. Are those available anywhere to see? Are you aware if the current problems are due to excessive usage abuse?

As far as accounting goes, I'm also curious how the 4TB breaks down. Upload, download, or both? Does the hosting provider give you access to a bandwidth API of some sort? Just trying to dredge up some of the old ideas I had...

Stay safe all.

richb-hanover commented 3 years ago

Thanks for the note - here's an update. I upgraded my server, and managed to break the logging of netperf connection attempts. Consequently, my connection-counting algorithms didn't have any data, so I blew past most of my January allocation in two days, so I turned off the netperf server.

I am now back to debugging this. I have posted a request for help at StackExchange. When I find the answer, I'll turn the server back on. Best regards to all...

guidosarducci commented 3 years ago

Yikes, it's very telling that the budget was blown in two days! Suggests to me that some users never bothered to throttle back automated tests despite your mitigations being in place for a while now.

Thanks for the links, I'll look through those as well. Could you confirm anything about how uploads vs. downloads count against the 4TB?

richb-hanover commented 3 years ago

Yes, there are lots of automated testing bots that just grind away every five minutes. They simply get dropped when the iptables rule kicks in, so there's no incentive (nor any feedback) for the owner to throttle. (I imagine they just figure the test server is broken again.)

I briefly speculated on some rules that would severely bandwidth-limit connections (say to 3 kbps) if they trip the threshold limit. That would definitely provide some feedback.

No, I don't have any information about the balance between uploads vs downloads. (The netperf utility doesn't keep those stats, nor does my hosting company provide that kind of info.) What would you do with the information if it were available? Thanks.

guidosarducci commented 3 years ago

Yes, there are lots of automated testing bots that just grind away every five minutes. They simply get dropped when the iptables rule kicks in, so there's no incentive (nor any feedback) for the owner to throttle. (I imagine they just figure the test server is broken again.)

Interesting, I didn't realize it could be so bad you might need to create and manage long-term client blacklists.

I briefly speculated on some rules that would severely bandwidth-limit connections (say to 3 kbps) if they trip the threshold limit. That would definitely provide some feedback.

Might it be better to use iptables REJECT rather than DROP as you say above? There's more chance on the client side of it being recognized as an explicit error instead of a timeout that could just be retried later automatically.

No, I don't have any information about the balance between uploads vs downloads. (The netperf utility doesn't keep those stats, nor does my hosting company provide that kind of info.) What would you do with the information if it were available? Thanks.

I wasn't thinking of our tracking statistics but rather accounting on the hosting company side: when do they turn off the taps? Can the server download all it wants, but uploads (i.e. "serving") are metered? Or does all traffic draw down the 4TB budget.

richb-hanover commented 3 years ago

Yes, there are lots of automated testing bots that just grind away every five minutes. They simply get dropped when the iptables rule kicks in, so there's no incentive (nor any feedback) for the owner to throttle. (I imagine they just figure the test server is broken again.)

Interesting, I didn't realize it could be so bad you might need to create and manage long-term client blacklists.

I haven't done a "longitudinal study", where I go back to see whether an address that was blacklisted a year (or a month) ago has quieted down. So I'm happy to host a growing blacklist. (If the algorithm got good enough, I might consider purging the full black list and letting it rebuild...)

I briefly speculated on some rules that would severely bandwidth-limit connections (say to 3 kbps) if they trip the threshold limit. That would definitely provide some feedback.

Might it be better to use iptables REJECT rather than DROP as you say above? There's more chance on the client side of it being recognized as an explicit error instead of a timeout that could just be retried later automatically.

Interesting thought. Does a netperf client show any useful indication if it receives a REJECT?

No, I don't have any information about the balance between uploads vs downloads. (The netperf utility doesn't keep those stats, nor does my hosting company provide that kind of info.) What would you do with the information if it were available? Thanks.

I wasn't thinking of our tracking statistics but rather accounting on the hosting company side: when do they turn off the taps? Can the server download all it wants, but uploads (i.e. "serving") are metered? Or does all traffic draw down the 4TB budget.

My hosting company counts all traffic toward the cap. If I can get this logging problem solved, I'll stand up the main netperf server and we can all go back to more productive discussions :-) Thanks.

toby-griffiths commented 2 years ago

Thanks for your support with this issue. I'm experiencing this today, so guessing the issue of bandwidth hasn't been resolved. I don't suppose you have details of the server setup needed for tests, do you? Perhaps we could make a Docker image available, so people could spin up their own test servers when they need them?

richb-hanover commented 2 years ago

I have turned back on the default netperf server (new month, new bandwidth limit). Let me know if you can't test against it.

That's an intriguing idea to create a Docker image. It looks as if someone has already beaten you to the idea:

toby-griffiths commented 2 years ago

Ah ha! Awesome. I didn't realise it was just a standard netperf tool server (I'm still new to all this networking stuff). But love the great information available out there. I'm off ot bed now, but good ot know I can fire up some docker containers should the bandwidth run out by the time I get round to testing again. Thank you for this, and your open source work! 🙏🏼

gioreva commented 2 years ago

Hi speedtest-netperf.sh worked once, after which the error always returns "WARNING: netperf returned errors. Results may be inaccurate!"

I tried 4 hosts, the same result. netperf.bufferbloat.net netperf-east netperf-west

This link says Host is working. http://netperf.bufferbloat.net/

Some help?

DakotaCardillo commented 1 year ago

Any update on this? Running into the same warning and zeroed results.

richb-hanover commented 1 year ago

The default netperf server (netperf-east.bufferbloat.net) is down because its bandwidth allocation got used up. You can try netperf-west.bufferbloat.net that is up now.

ValentinDrean commented 1 year ago

The default netperf server (netperf-east.bufferbloat.net) is down because its bandwidth allocation got used up. You can try netperf-west.bufferbloat.net that is up now.

I can confirm this also work from european location :

speedtest-netperf.sh --host "netperf-eu.bufferbloat.net"

AzimsTech commented 1 year ago

speedtest-netperf.sh --host "netperf-eu.bufferbloat.net"

this did the trick for me

openwrt / packages

speedtest-netperf not working with 'default setup' #13511