Throughput test Mismatch Discusion

arlake228 commented 9 years ago

Original issue 699 created by arlake228 on 2013-02-27T19:41:55.000Z:

Hey All;

I have been letting this sit for a while, mainly because I don't have a good answer, but lets open up discussion. This was a problem when we had 1G to 100M, and will be an issue when we go 10G to 100G too; realistically, what can we do in this space? Some ideas that I had:

Could we expect reasonable results in scraping dmesg/lsmod etc. to determine driver/nic speed?
If we knew the link speed on either end, could we register that to the LS?
If we could ask the LS for speeds, we could use something like TC to rate limit the higher speed sending to a lower speed, then remove the rules dynamically

We also have the scripts that Dave L implemented that ATLAS has been using, and I know our NOC has done something with munging linux routing tables for hosts with both a 1G and 10G connection (to ensure in/out uses the same route). I think our approach now of 'yeah, its a problem and good luck with that' isn't really cool, so we should try to do something.

Opinions? Fruit on the suggestions?

-jason

-=-=-=-=-=-=-=-=-=-=-=-

And it helps to fully flesh out the ideas a bit more. Yes, dmesg/ethtool give you 'some' info, here is the output from a Dell R610:

[zurawski@lhcmon ~]$ sudo ethtool eth0 Settings for eth0: Supported ports: [ FIBRE ] Supported link modes:
Supports auto-negotiation: No Advertised link modes: Not reported Advertised auto-negotiation: No Speed: 10000Mb/s Duplex: Full Port: FIBRE PHYAD: 0 Transceiver: external Auto-negotiation: off Current message level: 0x00000004 (4) Link detected: yes

and

[zurawski@lhcmon ~]$ dmesg | grep eth0 eth0: changing mtu from 9000 to 1500 myri10ge: eth0: link down myri10ge: eth0: link up myri10ge: eth0: link down myri10ge: eth0: link up eth0: changing mtu from 1500 to 9000 myri10ge: eth0: link down myri10ge: eth0: link up myri10ge: eth0: link down myri10ge: eth0: link up eth0: changing mtu from 9000 to 1500 myri10ge: eth0: link down myri10ge: eth0: link up

So this seems feasible, we could also just 'ask' for this as a part of the admin info page instead of trying to figure it out on our own (or do both for sanity checking reasons). Registering to LS is 'simple' as well. I have some canned TC scripts I used for DYNES recently (1Gbps speed in this case):

sudo /usr/sbin/tc qdisc del dev eth0 root sudo /usr/sbin/tc qdisc add dev eth0 handle 1: root htb sudo /usr/sbin/tc class add dev eth0 parent 1: classid 1:1 htb rate 128mbps sudo /usr/sbin/tc filter add dev eth0 parent 1: protocol ip prio 16 u32 match ip src 10.10.200.20/32 flowid 1:1

Presumably BWCTL/pSB could insert these, or some other helper script? Not sure what would make sense from an architecture perspective

In any event, it seems possible to mitigate this, I guess my question is does this all sound reasonable, and would it be possible on a not too distant time frame (not 3.3 naturally).

Thanks;

-jason

-=-=-=-=-=-=-=-=-==-=-=

I'm not sure if we need to respond to the tweet, but I have a few random comments on this:

1) I've seen tests where sending from a 10G to a 1G works fine (eg: a consistent 900 Mbps) . You just needs all the devices in the path to have enough buffers, and not be using a crappy NIC in the receive host. So this is not a universal issue.

2) I think we are registering the NIC speed in the new LS as part of 3.3. At least we discussed doing that. Andy?

3) using tc is certainly a good way to fix this, but think this might be a fair amount of work. But I suggest adding this idea to the issue tracker.

-=-=-=-=-=-=-=-=-=-=-=-

arlake228 commented 9 years ago

Comment #1 originally posted by arlake228 on 2013-02-28T14:32:29.000Z:

We'd need to do the 'tc' stuff as part of the iperf call since the way pSB works, bwctl does all the scheduling, and just calls iperf periodically.

I wonder if we should just start relying on nuttcp or iperf3 or something where the application can pace the TCP stream.

A side issue is that even if we do this, we can hit similar mismatches where the bottleneck link is not on the edges, but in the middle (e.g. a 10G connected host, but the upstream connection is only 1G). I'm not sure how often that occurs in the wild.

arlake228 commented 9 years ago

Comment #2 originally posted by arlake228 on 2013-03-04T18:27:21.000Z:

This once again comes down to: What are you trying to test?

Case 1: What kind of performance can users expect to get out-of-the-box on the existing networks?

Case 2: What kind of performance can users of highly tuned systems expect to get?

Ideally, the toolkit could be publishing both kinds of tests and using that data to encourage good best-practices. This would of course take publishing metadata about the tests to allow analysis to determine how 'tuned' a host is for a given test.

arlake228 commented 9 years ago

Comment #3 originally posted by arlake228 on 2013-03-18T15:01:28.000Z:

<empty>

arlake228 commented 9 years ago

Comment #4 originally posted by arlake228 on 2014-03-21T21:22:03.000Z:

the LS now includes the NIC speed, so that addresses part of this.

There is documentation on how to configure tc on fasterdata.

I think that will always need to be done by 'experts', and should not be configurable in the GUI, so closing this one.

perfsonar / project

Throughput test Mismatch Discusion #698