Open osrf-migration opened 7 years ago
Original comment by Steve Peters (Bitbucket: Steven Peters, GitHub: scpeters).
Thanks for your question. In our previous competition experience, we have excluded headers from bandwidth measurements. We are discussing internally and will confirm whether we will continue with this practice.
Original comment by dan (Bitbucket: dan77062).
As sringer99 points out, 64 bits per second is so low that we cannot get the headers across in a reasonable time. Even if that is the rate for data and not total, how are we going to set up a connection to test that? All of the filters that I know look at total rate and are not able to determine how much is data vs. overhead.
Also, I think there had to be a typo in the document and it should be bytes/sec rather than bits/sec. Nobody has done a spec in bits for a long, long time.
Original comment by sringer99 (Bitbucket: sringer99).
I the old days, a typical byte transfer has overhead as well. 1 start bit, 8 data bits, 1 checksum bit, and sometimes a stop bit. So it can take 10 to 11 bits to get a byte through. Not sure if that applies today but we would need to find out if we are going to properly manage the pipe between the OCU and Field Computer.
Original comment by Nate Koenig (Bitbucket: Nathan Koenig).
For reference, the virtual robotics challenge (VRC) had the same lower bound. Teams in that competition were able to solve similar tasks given these constraints.
I suggest you get creative. Hopefully, you come up with solutions that out perform those from the VRC, which was held four years ago.
Original comment by dan (Bitbucket: dan77062).
Those teams were fully funded university and industrial laboratories. This is not that. Furthermore, those teams had a very long time to work with those constraints. I point out that we only learned of this latest one about 6 weeks before the competition finals and the competition code and environment is still not fully working for us to test. It's hard to "get creative" when even the definition of the constraint has not been properly specified. Is it bits or bytes? Is it headers or just data?
Original comment by Nate Koenig (Bitbucket: Nathan Koenig).
Thanks for the comments, however there are a few inaccuracies. Only seven teams in the VRC were funded. As an example, the second place team consisted of unfunded undergrads. Teams also had less time, and the simulation software was less complete.
It is bits, as stated in the rules. Headers are not counted, and we will update the rules.
Original comment by dan (Bitbucket: dan77062).
We are required to use ROS messages, however, that bandwidth limit means that ROS messages are not a practical way to send commands. Consider a single ROS Pose. That is three float64 values for position and four float64 values for orientation. It would take 7 seconds just to send the coordinates part of a single pose!
The only hope is to write interpreter code to run on the cloud computer that takes in a few bits and creates corresponding ROS messages. Is that the intent? If so, it is a huge investment of time to write that code and will require extensive testing to the exclusion of time spent getting the more useful and interesting parts of the sim working.
With regard to the VRC, of course you are correct that the WPI team was not funded (well done WPI!) but MIT, JPL, Lockheed, etc. were very well funded and large teams. I recently talked to some of the MIT team members and they said that they spent a huge amount of effort dealing with the bandwidth limits. They were not limited to the IHMC controller, nor to just ROS messages, as we are.
I'm not looking for a quarrel here. You get to make the rules and if that's what they are, then we have to deal with them. I am simply trying to point out that this particular rule will divert a lot of time and resources from other important parts of the task and we have very little time left before the finals.
Original comment by Nate Koenig (Bitbucket: Nathan Koenig).
There is nothing in the rules that says you have to use ROS messages between the field computer and ICU.
Original comment by dan (Bitbucket: dan77062).
uh, right, did you see the point about the time and resources needed to write an interpreter for the cloud computer?
Original comment by Nate Koenig (Bitbucket: Nathan Koenig).
This competition is suppose to be a challenge, but not an impossible challenge. So far I have not heard a compelling argument that would classify the bandwidth limitations as being impossible.
Original comment by Rud Merriam (Bitbucket: rmerriam).
Clarification here, the task 1 bandwidth is "Bandwidth limitation: 64–4k bits/second uplink, 50k-380k bits/second downlink". Uplink is from the control computer to the robot. Downlink is from the robot to control computer. A robot pose is coming through the downlink bandwidth. Commands to the robot go through the uplink.
I need clarification since I don't know if a subscriber to robot pose is also sending a response from the message. My novice understanding is it does not. Obviously it is sending a subscription request but that can be done before leaving the start box.
I'm sure the team is working to get the competition cloud organized but it would be helpful if even a simple version could be made available for quick and dirty testing.
Original comment by Jedediyah Williams (Bitbucket: Jedediyah).
This is discouraging. I have to agree with @dan77062, and say that this is an unwelcome surprise change to the rules (decreasing the communications by a factor of more than 15000?) given that we are so close to the finals and we still haven't been told exactly how the finals are going to run. I can appreciate that this is a competition and a challenge, but it seems we are trying to develop software for a fast moving target here. I look forward to seeing what other teams come up with, and I am certain there will be some successes, but at this point I just don't even know what to work on. Some of the work I've already done has been undone by recent changes. It is clear that you are all working hard and designing a great competition, it's just that the finals date is extremely close for some of these changes and still unanswered questions.
Original comment by Steven Gray (Bitbucket: stgray).
As @rmerriam mentioned, it'd be nice to have a version of the comms restrictions we can use to test soon. If the AWS instances aren't ready, is there a good linux tool we can use to simulate the restrictions? Both the limited bandwidth and increased latency?
Original comment by Nate Koenig (Bitbucket: Nathan Koenig).
You can use the 'tc' command line tool. This is the same tool that will be used in the finals.
Original comment by Steven Gray (Bitbucket: stgray).
Thanks @nkoenig! Will the tc parameters used for the finals be shared as well?
On a related note, will the latency be constant for a run of a given task, or will it vary on-the-fly?
Original comment by dan (Bitbucket: dan77062).
Yes, please give details as soon as possible. Also, see issue #134 as I think that may be a big problem associated with low bandwidth. The code will have to be carefully scrubbed for connection leaks.
Original comment by Nate Koenig (Bitbucket: Nathan Koenig).
In order to reduce confusion, only communication between the OCU and field computer are limited. This means teams are in complete control over what data is transmitted. Please open an issue if you experience unexpected transmission of data.
Original comment by Steve Peters (Bitbucket: Steven Peters, GitHub: scpeters).
To see the bandwidth limits from the Virtual Robotics Challenge, see pages 8-9 in the following pdf:
Original comment by Steven Gray (Bitbucket: stgray).
The VRC maintained 500ms latency the entire time. For the SRC, can we assume the latency will remain constant for an entire task, but will change between tasks?
Original comment by Jedediyah Williams (Bitbucket: Jedediyah).
Are the rules from DARPA's VRC what we should be referencing for NASA's SRC? It seems like there are assumptions about similarities between the two competitions that haven't been made explicit to competitors of the SRC.
Original comment by dan (Bitbucket: dan77062).
My concern about the new bandwidth limitations is that they effectively prevent us from using ROS on the client side. This means that everyone participating is going to write their own, proprietary message protocol to work within the bandwidth limitations. Any new UIs or tiered autonomy work will use these message protocols that work only for a single team with a single robot. For a challenge that could have developed new, useful tools in ROS, this is a disappointing limitation.
Original comment by dan (Bitbucket: dan77062).
We are now focused on developing the tools needed to communicate over low bandwidth connections. Can you provide the details of the traffic control that will be used? Is it going to use the numbers in the rules as a ceiling and floor, with some distribution across those values? Or is it going to be more like the VRC, with a total number of bits for the run? Also, is the bit rate based on sim time or wall-clock time? Can you provide the actual tc parameters? It is hard to design our communication tools without knowing exactly what we need to design to.
Original comment by Rud Merriam (Bitbucket: rmerriam).
Dan trying to understand part of this issue. The download speed IMO is sufficient for ROS messages from the cloud. The upload is narrow but we can't control Val from our consoles. At most I see a proceed or do something different command being sent. Now part of this maybe I don't understand the dynamics of ROS messages.
My assumption is that once subscribed there is no uplink traffic. The published message is simply sent. Managing the downlink is then a matter of throttling messages.
Original comment by dan (Bitbucket: dan77062).
Rud The main issue I have is that we have worked hard to design our tools to have a 1-2Mbps bandwidth. Cutting that so far down makes the tools not work. And we still don't have details on the actual tc parameters.
The secondary issue is that the bandwidth is too low for using ROS messaging, specifically rviz, across the link. We could have done so much with that interface! Granted, the competition theme is a robot on Mars, so maybe that is just my not understanding the goals, but this approach does limit the wider applicability of the methods developed.
One of the wonderful things about ROS, and one reason we use it in our commercial robots, is that I thought I would never have to write another socket server/client again, but that is what I am doing now, with the hope of resurrecting our toolkit. You are right, that if you do not subscribe to a topic, then it takes up no bandwidth over the link. And also, the throttling tools are very good. But we were planning to do more with ROS on the client side and over the link.
Could you try something for me? With an uplink limited by tc to 2kbit per second and the downlink limited to 215kbit per second (those are the halfway points of the limits for task 1), most of the time a minimal node with a single subscriber, running on the client machine, fails to find the master. Can you see if that is true for your system too? Maybe I'm just doing something wrong.
Original comment by Nate Koenig (Bitbucket: Nathan Koenig).
Here is an example script (5c105c5623e52c9aca7b15a33219bbb973556942) to control bandwidth.
Original comment by dan (Bitbucket: dan77062).
ah, I was wondering how to limit the input. Using a virtual interface and limiting its output is a clever solution.
This is helpful, but we still don't know details of how the actual bandwidth limits will be set. The example sets specific numbers as limits for uplink and downlink, but our spec has a range of possible numbers. Do we get a random value for that range? Do we get a time-varying value with some probability distribution? How can we test without knowing these details?
Original comment by Nate Koenig (Bitbucket: Nathan Koenig).
If it were me, I would test by assuming the worse case scenario. Given the rules, the worse case scenario is a constant 64 bits/second uplink, and a constant 50k bits/second downlink.
Original comment by dan (Bitbucket: dan77062).
By "constant" you mean that there is no traffic shaping? How is the latency applied? Can we get an example script that includes latency?
Original comment by Steven Gray (Bitbucket: stgray).
@nkoenig On 4/14 in this thread, you said '''Headers are not counted, and we will update the rules.'''
I'm looking at the example script and can't tell if it is including the headers toward the throttled amount or not. Is it?
Original comment by Steven Gray (Bitbucket: stgray).
Just wanted to bump this up -- will headers be counted or not? Thanks!
Original comment by Nate Koenig (Bitbucket: Nathan Koenig).
We will add 256 bits per second to the bandwidth limit. That should take care of headers.
Original comment by Steven Gray (Bitbucket: stgray).
Awesome, thank you! That simplifies a lot of what I've been trying...
Original comment by dan (Bitbucket: dan77062).
no, that's not fair. You said headers would not be counted, so TCP and UDP would be basically the same, but if headers are included in the bandwidth limit, TCP and UDP are very different. Please stop changing the rules. We desperately need a stable code base.
Original comment by Nate Koenig (Bitbucket: Nathan Koenig).
Thank you Dan. Please refrain from yelling.
Can you also revise your sentence "Counting headers..." so that it is understandable.
Original comment by Nate Koenig (Bitbucket: Nathan Koenig).
Here is a tutorial that describes how to use tc
and test your settings.
Original comment by Jeremy White (Bitbucket: knitfoo).
A TCP header is more than 256 bits; it is on the order of 528 bits.
Original report (archived issue) by sringer99 (Bitbucket: sringer99).
I have just read the new rules and it shows that the uplink bandwidth for task-1 can be as low as 64 bits per second. This is a very low number considering that the IP header is 192 bits and the UPD header is 64 bits, then at this rate it will take 4 seconds to just transfer the headers. And this does not take into account the headers at the network/interface layer.
So here is my question. Is the uplink bandwidth limit for data only or does it also include all the headers?