tohojo / flent

The FLExible Network Tester.
https://flent.org
Other
431 stars 77 forks source link

rrul tests are not being able to saturate link #218

Closed olg33 closed 3 years ago

olg33 commented 3 years ago

Hi,

We are installing a new satellite link and we are using flent to run bandwidth testing. The idea is to saturate the link with marked and unmarked packets.

However initial results show that the test was not able to reach the maximum bandwidth downstream supported by the link which is about 60Mbps.

For unmarked traffic we ran the following test: flent rrul -H 10.111.40.252

For marked traffic we used the following test: flent rrul_var -H 10.111.40.252 --test-parameter bidir_streams=5 --test-parameter markings=11,13,15,19,21

On both cases we reached no more than 50% of the link capacity.

To rule out any problem with the link itself, we ran the following test: flent tcp_12down -H 10.111.40.252

This time. we were able to reach saturation levels (60Mbps) of downstream traffic. Also ran iperf3 and obtained the same results

Do you see any reason why the rrul and rrul_var tests aren't being able to generate enough traffic to saturate the link? Maybe any parameter I'm missing or something? Or maybe that's normal, anyhow, I'd appreciate any comment on this matter.

Thanks.

tohojo commented 3 years ago

olg33 notifications@github.com writes:

Hi,

We are installing a new satellite link and we are using flent to run bandwidth testing. The idea is to saturate the link with marked and unmarked packets.

However initial results show that the test was not able to reach the maximum bandwidth downstream supported by the link which is about 60Mbps.

For unmarked traffic we ran the following test: flent rrul -H 10.111.40.252

For marked traffic we used the following test: flent rrul_var -H 10.111.40.252 --test-parameter bidir_streams=5 --test-parameter markings=11,13,15,19,21

On both cases we reached no more than 50% of the link capacity.

To rule out any problem with the link itself, we ran the following test: flent tcp_12down -H 10.111.40.252

This time. we were able to reach saturation levels (60Mbps) of downstream traffic. Also ran iperf3 and obtained the same results

Do you see any reason why the rrul and rrul_var tests aren't being able to generate enough traffic to saturate the link? Maybe any parameter I'm missing or something? Or maybe that's normal, anyhow, I'd appreciate any comment on this matter.

Hmm, when you say satellite link, I assume this has really high RTT, right? This usually makes it really difficult for TCP to saturate the connection; I believe providers try to improve on this with various kinds of "accelerators" that mess with the TCP connection to try to alleviate this problem. That could be failing when there's bidirectional traffic?

Another thing to note is that when you're running bidirectional traffic the bulk traffic will be competing with ACKs in each direction. In particular, if there's a lot of queueing delay in the upstream, that will delay the ACKs which can prevent TCP from ramping up properly on the downstream. What kind of queueing is on the bottleneck link, and are you seeing the latency increase?

By my guess the latter effect would be most likely, or maybe a combination? You may be able to see something interesting if you capture the traffic and look at some tcptrace plots of the transfers.

The RRUL test is deliberately designed to stress connections to induce these sorts of weird failure cases, so I guess you could say it's not unexpected. But obviously it's not an ideal functioning of the link :)

flent-users commented 3 years ago

It is highly likely that ack-filtering (a cake feature) will help on the rrul test on a sat link.

also the rrul test does not measure the size of the ack packets, especially not relative to access to the media.

It does not surprise me at all that your sat link is optimized for download traffic only.

olg33 commented 3 years ago

Thanks,

Yes indeed, we mainly use TCP spoofing in our modem setup to address the delay issue. However, we were getting even worse results. We turned it off temporarily and testing improved, but that left us exposed to the ACK issues you just mentioned. We'll look into tweaking the spoofing feature to see if we can get better results. Thanks again for your input.

olg33 commented 3 years ago

In a related issue, It took our attention that whenever we run a rrul or rrul_var test, upload traffic seems to be absent.

Below examples were performed with two servers connected back-to-back with no router in between to rule out any blockage (they connect using an ethernet switch with ports in the same Vlan).

$ flent rrul_var -H 172.16.1.40 --test-parameter bidir_streams=6 --test-parameter markings=11,13,15,17,19,21

Summary of rrul_var test run from 2020-12-17 21:34:16.260047

Ping (ms) avg : 2.65 N/A ms 350 Ping (ms)::ICMP : 6.62 8.37 ms 350 Ping (ms)::UDP 0 (11) : 1.99 0.14 ms 350 Ping (ms)::UDP 1 (13) : 1.98 0.14 ms 350 Ping (ms)::UDP 2 (15) : 1.96 0.14 ms 350 Ping (ms)::UDP 3 (17) : 2.00 0.14 ms 350 Ping (ms)::UDP 4 (19) : 2.00 0.14 ms 350 Ping (ms)::UDP 5 (21) : 1.99 0.14 ms 350 TCP download avg : 156.53 N/A Mbits/s 350 TCP download sum : 939.18 N/A Mbits/s 350 TCP download::0 (11) : 190.43 171.36 Mbits/s 350 ---> OK TCP download::1 (13) : 166.60 170.35 Mbits/s 350 TCP download::2 (15) : 143.99 156.95 Mbits/s 350 TCP download::3 (17) : 171.11 170.65 Mbits/s 350 TCP download::4 (19) : 110.92 94.05 Mbits/s 350 TCP download::5 (21) : 156.13 0.01 Mbits/s 350 TCP totals : 939.87 N/A Mbits/s 350 TCP upload avg : 0.12 N/A Mbits/s 350 TCP upload sum : 0.69 N/A Mbits/s 350 TCP upload::0 (11) : 0.19 0.18 Mbits/s 350 ---> No data TCP upload::1 (13) : 0.10 0.17 Mbits/s 350 TCP upload::2 (15) : 0.04 0.07 Mbits/s 350 TCP upload::3 (17) : 0.14 0.13 Mbits/s 350 TCP upload::4 (19) : 0.09 0.13 Mbits/s 350 TCP upload::5 (21) : 0.13 0.14 Mbits/s 350

Now, if we run a tcp_12up test, we see upstream traffic:

Ping (ms) ICMP : 2.22 2.55 ms 350 TCP upload avg : 41.77 N/A Mbits/s 350 TCP upload sum : 501.25 N/A Mbits/s 350 TCP upload::1 : 38.43 38.24 Mbits/s 350 TCP upload::10 : 40.92 39.39 Mbits/s 350 TCP upload::11 : 42.97 41.55 Mbits/s 350 TCP upload::12 : 46.13 39.80 Mbits/s 350 TCP upload::2 : 46.77 44.63 Mbits/s 350 TCP upload::3 : 40.24 39.21 Mbits/s 350 TCP upload::4 : 40.75 40.54 Mbits/s 350 TCP upload::5 : 40.41 38.85 Mbits/s 350 TCP upload::6 : 42.30 40.38 Mbits/s 350 TCP upload::7 : 41.68 40.38 Mbits/s 350 TCP upload::8 : 40.70 39.47 Mbits/s 350 TCP upload::9 : 39.95 40.17 Mbits/s 350

Apparently, on bi-directional tests, upload traffic doesn't show, but if we run a Upsteram-only test, it works.

Traffic graphs on port connecting the servers show that indeed there is not Upstream traffic during testing.

Upload traffic must be present, otherwise how the second server is replying?

Is there any way to make upload traffic show up on rrul testing?

Thanks.

tohojo commented 3 years ago

olg33 notifications@github.com writes:

In a related issue, It took our attention that whenever we run a rrul or rrul_var test, upload traffic seems to be absent.

Below examples were performed with two servers connected back-to-back with no router in between to rule out any blockage (they connect using an ethernet switch with ports in the same Vlan).

Erm, that does sound odd! Only obvious thing I can immediately think of is maybe the network card is running in half-duplex mode? Did you check the output of ethtool for the interface you're using?

Otherwise, to debug further you can try running with -L and looking at the netperf output in the log file to see if there's any obvious pointers there...

olg33 commented 3 years ago

Hi tohojo,

Yes, I found the problem. The switch where our servers are connected had other servers and one of them had a port configured by mistake in the same Vlan of our piont-to-point connection. In principle that may not be a big deal, but once it was corrected I'm was able to push almost a Gig Up/Down:

TCP download avg : 133.64 N/A Mbits/s 351 TCP download sum : 935.50 N/A Mbits/s 351 ---------> OK TCP download::0 (11) : 123.63 123.19 Mbits/s 351 TCP download::1 (13) : 121.54 121.51 Mbits/s 351 TCP download::2 (15) : 121.08 122.05 Mbits/s 351 TCP download::3 (17) : 121.59 121.58 Mbits/s 351 TCP download::4 (19) : 77.91 78.98 Mbits/s 351 TCP download::5 (21) : 244.72 244.05 Mbits/s 351 TCP download::6 (25) : 125.03 124.48 Mbits/s 351 TCP totals : 1871.57 N/A Mbits/s 351 TCP upload avg : 133.72 N/A Mbits/s 351 TCP upload sum : 936.07 N/A Mbits/s 351 ---------> OK TCP upload::0 (11) : 135.29 134.01 Mbits/s 351 TCP upload::1 (13) : 134.59 134.75 Mbits/s 351 TCP upload::2 (15) : 131.10 131.34 Mbits/s 351 TCP upload::3 (17) : 131.24 131.24 Mbits/s 351 TCP upload::4 (19) : 134.21 134.62 Mbits/s 351 TCP upload::5 (21) : 134.76 134.69 Mbits/s 351 TCP upload::6 (25) : 134.88 134.74 Mbits/s 351

Thanks!