osrf / srcsim

Space Robotics Challenge
Other
9 stars 4 forks source link

traffic control parameters #146

Open osrf-migration opened 7 years ago

osrf-migration commented 7 years ago

Original report (archived issue) by dan (Bitbucket: dan77062).

The original report had attachments: shaping.txt


We have incomplete specifications for the bandwidth. We know the range for up and down for each task and that it will be implemented with tc. However, further details are required to design communication protocols. For example, do the ranges represent a floor and ceiling, with some shaping function? Without knowing the details of the specification, it is impossible to design good tools. This is blocking our development right now.

Can you provide the actual tc parameters that we will use?

osrf-migration commented 7 years ago

Original comment by Louise Poubel (Bitbucket: chapulina, GitHub: chapulina).


osrf-migration commented 7 years ago

Original comment by Louise Poubel (Bitbucket: chapulina, GitHub: chapulina).


Possible duplicate of #125

osrf-migration commented 7 years ago

Original comment by dan (Bitbucket: dan77062).


This issue is asking for specific data and, in that sense, is a subissue to #125. However, this issue is now blocking for us, as we are unable to further develop tools without knowing the details of the bandwidth. Since I did not create #125 I could not designate it as blocking.

osrf-migration commented 7 years ago

Original comment by Nate Koenig (Bitbucket: Nathan Koenig).


What additional information do you need that is not in the tc script? The script currently contains all the parameters.

osrf-migration commented 7 years ago

Original comment by dan (Bitbucket: dan77062).


Is there traffic shaping? How is latency applied and can we get an example script to test with both latency and bandwidth restricted?

osrf-migration commented 7 years ago

Original comment by Nate Koenig (Bitbucket: Nathan Koenig).


The example tc script has been updated with latency.

osrf-migration commented 7 years ago

Original comment by Jeremy White (Bitbucket: knitfoo).


I used the example script in what seemed like the obvious way on a constellation in an attempt to experience the simulated latency and bandwidth restriction. I captured my terminal output, on both sides. I was unable to experience any latency or degredation in bandwidth. If there is something obviously wrong with my command sequence, I'd appreciate the point.

osrf-migration commented 7 years ago

Original comment by Jeremy White (Bitbucket: knitfoo).


osrf-migration commented 7 years ago

Original comment by Jeremy White (Bitbucket: knitfoo).


Running the script against eth0 'worked', in that it shut off all traffic; not even a high latency, low bandwidth connection seemed possible.

pings from my oc to 192.168.2.150 remain low latency however, and I seem to be able to connect to the field computer instance.

osrf-migration commented 7 years ago

Original comment by Nate Koenig (Bitbucket: Nathan Koenig).


Try changing the -f parameter to -f 192.168.2.150/26

osrf-migration commented 7 years ago

Original comment by Steven Gray (Bitbucket: stgray).


I had the same experience as @knitfoo. Not even on the constellation, I tried the script on my local network (I have changed the IPs to match the competition machines) and didn't see any throttling. Is there something wrong with how I'm calling the script?

$ sudo ./src_tc.rb -i eth0 -d 64bit -u 50kbit -f 192.168.2.10/26 -l 10s
RTNETLINK answers: File exists
$ ping 192.168.2.150
PING 192.168.2.150 (192.168.2.150) 56(84) bytes of data.
64 bytes from 192.168.2.150: icmp_seq=1 ttl=64 time=0.538 ms
64 bytes from 192.168.2.150: icmp_seq=2 ttl=64 time=0.461 ms
64 bytes from 192.168.2.150: icmp_seq=3 ttl=64 time=0.516 ms
osrf-migration commented 7 years ago

Original comment by Jeremy White (Bitbucket: knitfoo).


I made another pass at this today and did not get success. I thought I'd share notes in the hopes it might help others. First, I was being naive about my initial test; I was testing traffic between the fc and the ocu. Instead, I should have been using the

#!bash

docker exec -it gazebo_run bash

command and testing from inside the docker image. Nicely, netcat is installed there, so I can do testing with ping and netcat.

Oddly, today, once I invoked the src_tb.rb script, the vpn connection between the ocu and the docker image was broken, and nothing I tried would bring it back. (In my experience with tc, deleting the qdev generally clears things, and I wasn't seeing that). That is, I could netcat from inside the docker container to my ocu up until the point I first ran src_tb, and then never again after that.

I did notice that the vpn conf file I downloaded actually has the ocu connecting to the sim, not the fc, which surprised me. I would have thought the vpn would go directly to the fc.

osrf-migration commented 7 years ago

Original comment by Steven Gray (Bitbucket: stgray).


@knitfoo Does the new script work for you on a local network? Whenever I try it, I only get the latency changes and no throttling. The older version worked to throttle, though it seems it uses some sort of moving-average in limiting the bandwidth -- so maybe 5-6 ping commands would get through immediately before they ceasing when limited to 64bps.

@nkoenig Is the new script what will be run during the competition?

osrf-migration commented 7 years ago

Original comment by Jeremy White (Bitbucket: knitfoo).


I haven't tried it locally. My (limited) experience with tc and traffic shaping leads me to feel that it's fiddly, and so having a local success won't prove anything. I crave being able to exactly replicate what we will experience during the competition.

I think I also haven't quite puzzled out what the network topology looks like; I'm used to a direct openvpn, and we seem to be going from ocu to sim and then on to the fc, and I haven't worked that out. That's all code for me saying that I'm not even sure throttling tapi0 is the right approach :-/.

osrf-migration commented 7 years ago

Original comment by Nate Koenig (Bitbucket: Nathan Koenig).


@stgray, can you try running tc with -f 192.168.2.150/26, not -f 192.168.2.10/26?

This example script will be run during the competition.

@knitfoo, your connection is from the OCU to the field computer.

osrf-migration commented 7 years ago

Original comment by Steven Gray (Bitbucket: stgray).


@nkoenig I've tried the script again with the other IP as you suggested and now it just renders the connection inoperable. I ran with generous bandwidth limits and almost no lag, using sudo ./src_tc.rb -i eth0 -d 50mbit -u 50mbit -f 192.168.2.150/26 -l 10ms. I ran the script on 192.168.2.10 and was unable to ping 192.168.2.150. I was also unable to receive UDP packets.

With the old script, I was able to clear the settings with:

tc qdisc del dev eth0 root
tc qdisc del dev ifb0 root

but this now no longer clears them.

osrf-migration commented 7 years ago

Original comment by Steven Gray (Bitbucket: stgray).


@iche033 Has the example comms reduction script been tested on the CloudSim setup? Could you please post instructions for using it (ie, the right interface and IP to test it, and should it be running in the docker container?) Every time I've tried it it prevents anything at all coming through, regardless of latency and bandwidth throttling settings. I also haven't had any luck running it just on the connection between machines on my local network, as in the post above. Thanks in advance.

osrf-migration commented 7 years ago

Original comment by Jeremy White (Bitbucket: knitfoo).


@nkoenig Copying the IP of the current running simulator gives 54.183.180.208. My currently running FC is 54.183.207.48.

And I've just downloaded openvpn config info and done:

#!shell

[~/xfer/cloudsim] grep remote openvpn.conf 
#   'remote'                #
remote 54.183.180.208 1196 # Server IP and port

As you can see, the remote given in the vpn corresponds to the sim computer, not the field computer. I appreciate that the final endpoint of the vpn is the FC, and it works, but I had not previously realized that the vpn went through the sim computer.

That is what I meant.

osrf-migration commented 7 years ago

Original comment by Jeremy White (Bitbucket: knitfoo).


Made another pass. Was unable to achieve anything. I tried running the script against the br0 and docker0 interfaces on the fc. I couldn't run the script inside the docker container because of kernel issues. I also tried running the script against the bridge interface on the sim.

This time, I never bricked anything, but I also never got a material change. I may have imposed bandwidth limitations; I was only probing latency at the time, though (latency is easy to check; bandwidth a bit trickier).

osrf-migration commented 7 years ago

Original comment by Steven Gray (Bitbucket: stgray).


@nkoenig I think @knitfoo and I have had a similar experience -- either the connection is not modified at all or it is completely blocked. Any suggestions would be greatly appreciated.

osrf-migration commented 7 years ago

Original comment by Nate Koenig (Bitbucket: Nathan Koenig).


Can you try things piecemeal and with only your local computer first? For example, try to just set an outbound bandwidth limitation on your local machine, and test using something like this.

If you can change successfully change settings on your local computer, then try between your local host and local docker container. You can using iperf.

Once you have the confirmed, then move to cloudsim. You can ssh into the field computer, and try running the tc command directly.

osrf-migration commented 7 years ago

Original comment by Steven Gray (Bitbucket: stgray).


I've got two computers on the same local network. For fun, I gave them the competition IPs, 192.168.2.10 and 192.168.2.150. I restarted the one ending in 2.10 to clear any setting I had from messing with tc earlier. I ran the script and saw:

sudo ./src_tc.rb -i eth0 -d 50mbit -u 50mbit -f 192.168.2.150/26 -l 10ms
[sudo] password for steve: 
RTNETLINK answers: No such file or directory
Cannot find device "ifb0"

On 2.10, I also ran iperf -s. On 2.150, I ran iperf -c 192.168.2.10 and got connect failed: Connection timed out. I also can't ping between machines anymore.

The craziest thing is that before I rebooted 2.10, I had tried the script and tested with iperf -- and it worked. I had been messing with the older version of the script before that. I'll see if I can reproduce.

osrf-migration commented 7 years ago

Original comment by Steven Gray (Bitbucket: stgray).


I followed up by running a variant of the old script, which got things flowing again. I ran sudo ./start_tc.rb -i eth0 -d 50kbit -u 50kbit -l 20ms on the script below. Afterwards, I ran the new example script again, and lo and behold, it worked. I ran sudo ./src_tc.rb -i eth0 -d 50mbit -u 50mbit -f 192.168.2.150/26 -l 15ms and saw pings of 30ms between machines, and iperf showed a 50mbit connection. There must be a critical setting in the old script still (or the variant I made).

The script is:

#!/usr/bin/env ruby

require 'optparse'

# The physical interface
iface = ARGV[0]

# Downlink bandwidth limit
downlinkBandwidth = "1mbit"

# Uplink bandwidth limit
uplinkBandwidth = "100mbit"

# Default options
options = {:iface => "eth0", :downlink => "100mb", :uplink => "10mb", :latency => "100"}

OptionParser.new do |opts|
  opts.banner = "Usage: sudo ./src_tc.rb [options]\n\n" +
    "A bandwidth value must have one of the following suffixes:\n" +
    "  b\t\tBytes\n" +
    "  kbit\t\tKilobits\n"+
    "  k, or kb\tKilobytes\n"+
    "  mbit\t\tMegabits\n" +
    "  m, or mb\tMegabits\n" +
    "  gbit\t\tGigabits" +
    "  g, or gb\t Gigabytes\n\n" +
    "Example:\n  sudo ./src_tc.rb -i eth0 -d 100mbit -u 10kb -l 100ms \n\n" +
    "Options:"

  opts.on('-i', '--iface INTERFACE', "Ethernet interface name. Default=#{options[:iface]}") { |v|
    options[:iface] = v
  }

  opts.on('-d', '--downlink BANDWIDTH', "Downlink bandwidth. Default=#{options[:downlink]}") { |v|
    options[:downlink] = v
  }

  opts.on('-u', '--uplink BANDWIDTH', "Uplink bandwidth. Default=#{options[:uplink]}") { |v|
    options[:uplink] = v
  }
  opts.on('-l', '--latency Delay', "Round trip latency in ms. Default=#{options[:latency]}") { |v|
    options[:latency] = v
  }

end.parse!

# Clear tc
`tc qdisc del dev #{options[:iface]} root`
`tc qdisc del dev ifb0 root`

# Insert the ifb module so that we can redirect incoming (ingress) traffic
# to a virtual interface. This will allow us to apply a bandwidth limit to
# incoming traffic. Bandwidth limits can only be applied to send queues.
# This is why we must redirect incoming traffic to a virtual interface, and
# then limit the virtual interface's outbound queue.
`modprobe ifb numifbs=1`

# Create the virtual interface
`ip link set dev ifb0 up`
# Redirect ingress traffic from the physical interface to the virtual
# interface.
`tc qdisc add dev #{options[:iface]} root netem delay #{options[:latency]}`
`tc qdisc add dev #{options[:iface]} handle ffff: ingress`
`tc filter add dev #{options[:iface]} parent ffff: protocol ip u32 match u32 0 0 action mirred egress redirect dev ifb0`

# Apply egress (uplink) rules for the physical interface
`tc qdisc add dev #{options[:iface]} root handle 1: htb default 10`
`tc class add dev #{options[:iface]} parent 1: classid 1:1 htb rate #{options[:uplink]}`
`tc class add dev #{options[:iface]} parent 1:1 classid 1:10 htb rate #{options[:uplink]}`

# Apply ingress (downlink) rules for the physical interface
# via egress rules for the virtual interface.
`tc qdisc add dev ifb0 root handle 1: htb default 10`
`tc class add dev ifb0 parent 1: classid 1:1 htb rate #{options[:downlink]}`
`tc class add dev ifb0 parent 1:1 classid 1:10 htb rate #{options[:downlink]}`
osrf-migration commented 7 years ago

Original comment by Ian Chen (Bitbucket: Ian Chen, GitHub: iche033).


just catching up on the discussion, here's the pull request containing a TC script that'll be run on the FC. Note that there have been some updates to the script today, so try testing this version.

I just tested it on cloudsim FC computer with the same command you guys are already using:

sudo ./src_tc.rb -i tap0 -u 2mbit -d 380kbit -f 192.168.2.150/26 -l 250ms

The 250ms latency is applied to inbound and outbound traffic, resulting in a ping time of 500ms. If you set a large value (like 10 or 20 seconds), you'll get a connection timed out with iperf, and ping will drop packets for the amount of time you set but should work after the initial delay period.

the uplink here is the outbound traffic: FC -> OCU, and downlink is inbound: OCU -> FC

Note I find that sometimes if I've been experimenting with various rules using TC, some are left behind and were not cleaned up with just the command: tc qdisc del dev tap0 root

osrf-migration commented 7 years ago

Original comment by Ian Chen (Bitbucket: Ian Chen, GitHub: iche033).


Also note, FC host IP address has recently changed to 192.168.2.8 but the FC docker container IP address remains at 192.168.2.10. Your code will be run inside the container with 192.168.2.10 but in case you're doing iperf test from the FC host machine, use the 192.168.2.8 address.

osrf-migration commented 7 years ago

Original comment by Steven Gray (Bitbucket: stgray).


Thanks @iche033! That new script worked great for me. Looks like all my local traffic is being adjusted correctly. I'll try it on CloudSim tomorrow.

osrf-migration commented 7 years ago

Original comment by Ian Chen (Bitbucket: Ian Chen, GitHub: iche033).


Actually keep monitoring the pull request for the latest version of the script. It's actively being tested and reviewed so there're (but hopefully no more) minor tweaks to it.

osrf-migration commented 7 years ago

Original comment by Víctor López (Bitbucket: Victor Lopez).


I am trying to execute the latest version of the script on the merge request on the Field computer, it fails, here's the log:

#!bash

 sudo ./src_tc.rb -i tap0 -l 5000ms
RTNETLINK answers: No such file or directory
Cannot find device "ifb0"
Illegal "rate"
Usage: ... qdisc add ... htb [default N] [r2q N]
                      [direct_qlen P]
 default  minor id of class to which unclassified packets are sent {0}
 r2q      DRR quantums are computed as rate in Bps/r2q {10}
 debug    string of 16 numbers each 0-3 {0}

 direct_qlen  Limit of the direct queue {in packets}
... class add ... htb rate R1 [burst B1] [mpu B] [overhead O]
                      [prio P] [slot S] [pslot PS]
                      [ceil R2] [cburst B2] [mtu MTU] [quantum Q]
 rate     rate allocated to this class (class can still borrow)
 burst    max bytes burst which can be accumulated during idle period {computed}
 mpu      minimum packet size used in rate computations
 overhead per-packet size overhead used in rate computations
 linklay  adapting to a linklayer e.g. atm
 ceil     definite upper class rate (no borrows) {rate}
 cburst   burst but for ceil {computed}
 mtu      max packet size we create rate map for {1600}
 prio     priority of leaf; lower are served first {0}
 quantum  how much bytes to serve from leaf at once {use r2q}

TC HTB version 3.3
RTNETLINK answers: No such file or directory
Illegal "rate"
Usage: ... qdisc add ... htb [default N] [r2q N]
                      [direct_qlen P]
 default  minor id of class to which unclassified packets are sent {0}
 r2q      DRR quantums are computed as rate in Bps/r2q {10}
 debug    string of 16 numbers each 0-3 {0}

 direct_qlen  Limit of the direct queue {in packets}
... class add ... htb rate R1 [burst B1] [mpu B] [overhead O]
                      [prio P] [slot S] [pslot PS]
                      [ceil R2] [cburst B2] [mtu MTU] [quantum Q]
 rate     rate allocated to this class (class can still borrow)
 burst    max bytes burst which can be accumulated during idle period {computed}
 mpu      minimum packet size used in rate computations
 overhead per-packet size overhead used in rate computations
 linklay  adapting to a linklayer e.g. atm
 ceil     definite upper class rate (no borrows) {rate}
 cburst   burst but for ceil {computed}
 mtu      max packet size we create rate map for {1600}
 prio     priority of leaf; lower are served first {0}
 quantum  how much bytes to serve from leaf at once {use r2q}

TC HTB version 3.3
RTNETLINK answers: No such file or directory

I also tried to download it and execute it inside the running team container:


ubuntu@ip-172-31-13-121:~$ docker exec team_container bash -c "wget https://bitbucket.org/osrf/cloudsim-sim/raw/d98623fb3495bb2201bb67e1bcb5e31855a07512/src_commands/src_tc.rb && chmod +x src_tc.rb && ./src_tc.rb  -i eth0 -l 5000ms"
--2017-06-07 09:25:55--  https://bitbucket.org/osrf/cloudsim-sim/raw/d98623fb3495bb2201bb67e1bcb5e31855a07512/src_commands/src_tc.rb
Resolving bitbucket.org (bitbucket.org)... 104.192.143.1, 104.192.143.2, 104.192.143.3, ...
Connecting to bitbucket.org (bitbucket.org)|104.192.143.1|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3212 (3.1K) [text/plain]
Saving to: ‘src_tc.rb.1’

     0K ...                                                   100%  454M=0s

2017-06-07 09:25:55 (454 MB/s) - ‘src_tc.rb.1’ saved [3212/3212]

RTNETLINK answers: Operation not permitted
Cannot find device "ifb0"
modprobe: ERROR: ../libkmod/libkmod.c:556 kmod_search_moddep() could not open moddep file '/lib/modules/4.4.0-78-generic/modules.dep.bin'
Cannot find device "ifb0"
RTNETLINK answers: Operation not permitted
Cannot find device "ifb0"
bad action parsing
parse_action: bad value (5:mirred)!
Illegal "action"
RTNETLINK answers: Operation not permitted
Illegal "rate"
Usage: ... qdisc add ... htb [default N] [r2q N]
                      [direct_qlen P]
 default  minor id of class to which unclassified packets are sent {0}
 r2q      DRR quantums are computed as rate in Bps/r2q {10}
 debug    string of 16 numbers each 0-3 {0}

 direct_qlen  Limit of the direct queue {in packets}
... class add ... htb rate R1 [burst B1] [mpu B] [overhead O]
                      [prio P] [slot S] [pslot PS]
                      [ceil R2] [cburst B2] [mtu MTU] [quantum Q]
 rate     rate allocated to this class (class can still borrow)
 burst    max bytes burst which can be accumulated during idle period {computed}
 mpu      minimum packet size used in rate computations
 overhead per-packet size overhead used in rate computations
 linklay  adapting to a linklayer e.g. atm
 ceil     definite upper class rate (no borrows) {rate}
 cburst   burst but for ceil {computed}
 mtu      max packet size we create rate map for {1600}
 prio     priority of leaf; lower are served first {0}
 quantum  how much bytes to serve from leaf at once {use r2q}

TC HTB version 3.3
RTNETLINK answers: Operation not permitted
We have an error talking to the kernel
RTNETLINK answers: Operation not permitted
Cannot find device "ifb0"
Illegal "rate"
Usage: ... qdisc add ... htb [default N] [r2q N]
                      [direct_qlen P]
 default  minor id of class to which unclassified packets are sent {0}
 r2q      DRR quantums are computed as rate in Bps/r2q {10}
 debug    string of 16 numbers each 0-3 {0}

 direct_qlen  Limit of the direct queue {in packets}
... class add ... htb rate R1 [burst B1] [mpu B] [overhead O]
                      [prio P] [slot S] [pslot PS]
                      [ceil R2] [cburst B2] [mtu MTU] [quantum Q]
 rate     rate allocated to this class (class can still borrow)
 burst    max bytes burst which can be accumulated during idle period {computed}
 mpu      minimum packet size used in rate computations
 overhead per-packet size overhead used in rate computations
 linklay  adapting to a linklayer e.g. atm
 ceil     definite upper class rate (no borrows) {rate}
 cburst   burst but for ceil {computed}
 mtu      max packet size we create rate map for {1600}
 prio     priority of leaf; lower are served first {0}
 quantum  how much bytes to serve from leaf at once {use r2q}

TC HTB version 3.3
Cannot find device "ifb0"
Cannot find device "ifb0"
osrf-migration commented 7 years ago

Original comment by Nate Koenig (Bitbucket: Nathan Koenig).


Here is a tutorial that describes how to use tc and test your settings.

osrf-migration commented 7 years ago

Original comment by Víctor López (Bitbucket: Victor Lopez).


Thanks @nkoenig much appreciated.

Do you also have something about reducing the upload/download speed to emulate the finals settings?

osrf-migration commented 7 years ago

Original comment by Ian Chen (Bitbucket: Ian Chen, GitHub: iche033).


@vlopez87, looks like tc doesn't like the default uplink and downlink values in our script. You can try specifying up/link values like -u 50mbit -d 50mbit

osrf-migration commented 7 years ago

Original comment by Víctor López (Bitbucket: Victor Lopez).


@iche033 I will try tomorrow, but I had the same issue in the past in my computer and I believe it was due to a missing kernel module that allowed to create the kind of interface that allows to separate upload and download. Never managed to solve it on my computer.

osrf-migration commented 7 years ago

Original comment by Jeremy White (Bitbucket: knitfoo).


Great, thank you, I am now able to experiment with and feel that I am simulating potential competition conditions.

My first observations (via wireshark): there is a lot of ARP chatter on the link. I've also got an SSSD process chattering; I'm going to want to be very careful to quiet anything similar to that. (That's more a note for myself).

I've also noticed that if I change the shaping conditions, it breaks the connection to ros.

osrf-migration commented 7 years ago

Original comment by Steven Gray (Bitbucket: stgray).


@iche033 I copied the newest script (pull request d98623f) over to the FC. Ran

sudo ./src_tc.rb -i tap0 -u 50kbit -d 2kbit -f 192.168.2.150/26 -l 10s

I noticed I had to run it 3x before the settings seemed to take effect (ie, ran ping 192.168.2.10 from my OCU machine after each try running the script. After 3rd iteration, saw 20s latency). Aside from that, seems to work great.

osrf-migration commented 7 years ago

Original comment by Víctor López (Bitbucket: Victor Lopez).


Nevermind my comment that was here, probably my container was hogging all the bandwidth.

osrf-migration commented 7 years ago

Original comment by Ian Chen (Bitbucket: Ian Chen, GitHub: iche033).


@stgray interesting, so the first 2 times did not work or had a different effect? We'll keep testing the script

osrf-migration commented 7 years ago

Original comment by Steven Gray (Bitbucket: stgray).


First two times seemed to have no effect. Third time applying it worked. (I only looked at the ping from 192.168.2.10 to 192.168.2.150)

osrf-migration commented 7 years ago

Original comment by Víctor López (Bitbucket: Victor Lopez).


I tried this on my computer trying to throttle the docker0 interface (or the interface with random name that is created per docker) but after some seconds working, the connection went completely dead and I couldn't ping either way. It resumed only after I called the stop src script.

So I'm only able to test it on the Cloud, which is complicated because for every change I have to make on the FC software I need to rebuild the image.

osrf-migration commented 7 years ago

Original comment by Ian Chen (Bitbucket: Ian Chen, GitHub: iche033).


I haven't tried this but when you do docker run to start your container, try passing in --net=host and apply the TC script to eth0 and see if that works.