noxrepo / pox

The POX network software platform
https://noxrepo.github.io/pox-doc/html/
Apache License 2.0
624 stars 472 forks source link

Pox error and packet loss when running openflow discovery and spanning tree on large fat tree topologies #139

Closed gmoleski closed 6 years ago

gmoleski commented 9 years ago

Hi whenever I am running pox with this command "pox.py forwarding.l2_multi openflow.discovery openflow.spanning_tree --no-flood --hold-down" and a fat tree topology with 50-80 hosts I can't make them communicate. Some can communicate but most not. I think this is mostly due to link timeout, however when I set a larger link timeout still the problem isn't entirely solved. Is there a way to get around this?

Some of the errors I also get are: ERROR:openflow.of_01:[00-00-00-00-00-20 457] OpenFlow Error: [00-00-00-00-00-20 457] Error: header: [00-00-00-00-00-20 457] Error: version: 1 [00-00-00-00-00-20 457] Error: type: 1 (OFPT_ERROR) [00-00-00-00-00-20 457] Error: length: 36 [00-00-00-00-00-20 457] Error: xid: 112392 [00-00-00-00-00-20 457] Error: type: OFPET_BAD_REQUEST (1) [00-00-00-00-00-20 457] Error: code: OFPBRC_BUFFER_UNKNOWN (8) [00-00-00-00-00-20 457] Error: datalen: 24 [00-00-00-00-00-20 457] Error: 0000: 01 0d 00 18 00 01 b7 08 00 00 03 4f 0000 08 |...........O....| [00-00-00-00-00-20 457] Error: 0010: 00 00 00 08 ff fb 00 00 |........ |

And also l2_multi throws the error "packet arrived without flow".

Thanks.

MurphyMc commented 9 years ago

What version of POX?

gmoleski commented 9 years ago

I believe I am using version 2.0 of branch carp

MurphyMc commented 9 years ago

A couple quick things to try: 1) Upgrade to POX dart (a more recent version is always the first thing I try) 2) Pass --eat-early-packets to discovery

gmoleski commented 9 years ago

Thanks for the information but after running it with the latest version of dart I noticed that it is worse than carp. In dart after some time (I imagine link timeout) the nodes fail completely and errors are continuously thrown in the pox cmd. In carp I get some packet losses at the start and then when I re-run my traffic generator packet losses are decreasing but that is if I get the correct link timeout parameter for a certain topology. I will try to play more with the parameters and let you know. Thanks.

mvanotti commented 8 years ago

Hi! I'm having this same problem! is there any way to fix it?

MurphyMc commented 8 years ago

Post the POX log and a description of what you're doing.

mvanotti commented 8 years ago

Hi! I'm running mininet with a Fat Tree Topology (TreeTopo depth=3, fanout=4) Once I start the experiment, I try to create a TCP connection between every pair of nodes and exchange one small message (< 100bytes) every 1 second, for thirty seconds).

The problem I am having is that some of the hosts can connect, and in the pox log, I'm getting:

ERROR:openflow.of_01:[00-00-00-00-00-01 6] OpenFlow Error:
[00-00-00-00-00-01 6] Error: header: 
[00-00-00-00-00-01 6] Error:   version: 1
[00-00-00-00-00-01 6] Error:   type:    1 (OFPT_ERROR)
[00-00-00-00-00-01 6] Error:   length:  36
[00-00-00-00-00-01 6] Error:   xid:     60475
[00-00-00-00-00-01 6] Error: type: OFPET_BAD_REQUEST (1)
[00-00-00-00-00-01 6] Error: code: OFPBRC_BUFFER_UNKNOWN (8)
[00-00-00-00-00-01 6] Error: datalen: 24
[00-00-00-00-00-01 6] Error: 0000: 01 0d 00 18 00 00 ec 3b  00 00 05 11 00 05 00 08   |.......;........|
[00-00-00-00-00-01 6] Error: 0010: 00 00 00 08 ff fb 00 00                            |........        |

and

INFO:packet:(ipv6) warning IP packet data incomplete (114 of 149)
INFO:packet:(dns) parsing questions: next_question: truncated

My end goal is to run an internet-like topology using mininet, with at least, 100 nodes.

MurphyMc commented 8 years ago

Post the whole log and the commandline you're using.

mvanotti commented 8 years ago

Sorry about that! I uploaded everything to a gist here. In the gist you will find:

I've run Pox from the carp branch with the following parameters: ./pox.py forwarding.l2_learning

Please let me know if there's anything else I can do.

MurphyMc commented 8 years ago

That commandline definitely won't work. See, the following POX FAQ entry, for example: https://openflow.stanford.edu/display/ONL/POX+Wiki#POXWiki-DoesPOXsupporttopologieswithloops%3F

I'd suggest upgrading to POX eel, and then starting with: pox.py forwarding.l2_learning openflow.discovery --eat-early-packets openflow.spanning_tree --no-flood --hold-down

or

pox.py forwarding.l2_multi openflow.discovery --eat-early-packets openflow.spanning_tree --no-flood --hold-down

You may have some luck tweaking the discovery timeouts too.

mvanotti commented 8 years ago

Hi! Thanks for your answer.

I've re-ran the test with eel code and the first command you send. It still doesn't work :(. Here's the full log.

When you mean the discovery timeouts, do you mean my own tcp connections timeouts or something related to pox?

Thanks!

MurphyMc commented 8 years ago

A first note is that I think your life will improve a bit if you disable IPv6.

You might have better luck with the second command. l2_learning with a spanning tree is a pretty silly way to run a network with a whole lot of loops (like a fat tree) anyway.

I believe what's going on here is that packets are getting caught in a loop which is causing discovery to fail. I don't immediately know why this is happening, quite possibly something to do with the event handler priorities... the --eat-early-packets option is meant to prevent this, but it apparently isn't. Try adding something like the following to the very start of the PacketIn handler in the forwarding component you're using (e.g., l2_learning): if (time.time() - event.connection.connect_time) <= 1.5 * core.openflow_discovery.send_cycle_time: return

See if that helps. The idea is to make sure it doesn't do any forwarding until discovery has had a chance to discover. (Again, this is what --eat-early-packets is meant to make happen, but this is a more direct way of doing it.)

By discovery timeouts, I meant the --link-timeout parameter to openflow.discovery. Increasing it may help things. It's currently coupled to how fast discovery cycles (sends packets out all the ports), which isn't fundamental... you could make a somewhat more "forgiving" version by keeping the cycle time shorter but making the timeouts longer.

mvanotti commented 8 years ago

Hi! Thanks for your answer!

I've tried the other method and it fails too. Here is the complete log. I ran this command:

./pox.py forwarding.l2_multi openflow.discovery --eat-early-packets openflow.spanning_tree --no-flood --hold-down

I'm not sure I follow. How does the TreeTopo contains any loops? It is just a tree, no loops. Why would it have any problem? (I opted for this topology because I wanted to discard other possible problems relating loops in the topology).

Maybe I'm doing something that is wrong. I want to simulate a big network (>100 nodes) using mininet and pox. Maybe running everything in a big tree topo is not the correct way.

MurphyMc commented 8 years ago

Ah. You'd said above that it was a fat tree, which has lots of loops. If it's just a plain tree, then that's not the problem. Maybe the problem is just too much going on at once.

It looked like maybe you're running a ping test between everything before starting your test. Is that right? What are the results from that? Are the pings working?

Since there are no loops, there's no need for discovery/spanning_tree. Try:

./pox.py forwarding.l2_pairs

mvanotti commented 8 years ago

Hi!

I too think the problem is "too much going on at once". But I don't know if that's solvable. I always create 64 nodes, but I choose randomly 20, 32, and 64 of them. For 20, if I run a ping with the 20 nodes, I don't usually get errors. For 32, if I run a ping between the 32 nodes, I sometimes get errors. For 64, running a ping with the 64 nodes, takes ~40 minutes and the few times I tried it, I got errors.

In all the experiments I sleep for 10 seconds before starting the ping, and then for 10 seconds again after the ping.

Running ./pox.py forwarding.l2_pairs also got me errors.

PS: I was under the impression that Fat Tree didn't have loops either, at least not the Fat Tree used in Maxinet. But the end goal is to test it with a topology with lots of loops, so..

mvanotti commented 8 years ago

I've just tried to run this example from mininet specifying the POX controller, and I got packet loss during the pings running with ./pox.py forwarding.l2_pairs and also with ./pox.py forwarding.l2_multi openflow.discovery --eat-early-packets openflow.spanning_tree --no-flood --hold-down

MurphyMc commented 8 years ago

When people talk about fat trees (at least in my experience in networking), they're almost always referring to the topology discussed by Al-Fares et al., which are also (probably more properly) called "folded Clos networks". Lots of loops.

The fact that you eventually want to run on a topology with lots of loops will hopefully be fine considering the problem you're seeing apparently doesn't have to do with loops. It's definitely easier to use a no-loop network to start off.

Try modifying the example to run the ping tests more than once with a few seconds between. Does it eventually get to 100%?

mvanotti commented 8 years ago

Sorry if I was not clear enough. When I ran the test with 64 nodes, none of the packets got lost, but I still got connection errors on my tests.

In the example above, I just got some errors the first time I ran it. Each full ping test takes like ~40 minutes with my computer using 100% cpu

MurphyMc commented 8 years ago

Okay, let's get on the same page.

First, pull the very latest version of POX eel. I just pushed a fix to a dumb bug.

Modify the mininet treeping64 example. You may have already done some of this.

  1. Comment out the UserSwitch line (so that it only runs using the OVS kernel switch).
  2. Adjust it so that it runs a POX controller (probably just using RemoteController).
  3. Replace the result = network.run( network.pingAll ) so that we can insert a pause. Should be something like...
network.start()
time.sleep(20) # Give discovery time to cycle
result = network.pingAll()
network.stop()

Then try the modified example with "forwarding.l2_pairs" and "openflow.discovery forwarding.l2_multi". Both of these work fine here. Takes a couple minutes to complete in either case.

mvanotti commented 8 years ago

Hi! Thank you very much for taking the time to test this out.

I can't see your push on github.

Ugh, I can't reproduce it now. But I swear that it wasn't working yesterday.

The other test I have been doing still fails however :(. (It's more complex than a simple ping, it requires establishing TCP Connections between nodes).

mvanotti commented 8 years ago

I was able to run 64 nodes in my computer! For this test I did a full ping between all pair of nodes and then waited 20 seconds before and after the ping.

It seems that the problem is then related to timing. Do you know what can be causing this? Is it possible to be sure when can I start sending packets ?

MurphyMc commented 8 years ago

Sorry, I pushed it to my fork, not upstream. I've pushed it upstream now too. I think you should probably upgrade to it. The issue fixed here could definitely mess up your experiments in a somewhat-hard-to-reproduce-reliably way (e.g., depending on when Mininet and POX start up relative to each other).

And right; I understood that your real test was more complex than a simple ping. I wanted to start with a simpler test and work our way up.

Yes, I also believe it has to do with timing, but I don't know exactly what aspect yet. There are at least two possibilities.

  1. If using discovery/spanning_tree, some of the options disable flooding and attempt to "eat" (discard) packets before discovery has had a chance to discover all links. These approaches are for safety -- they drop packets rather than flood them before the topology is known (or before the spanning tree is set up), since before that... flooding them may result in packets caught in loops. So your experiment will fail until discovery's "cycle time" has elapsed. As mentioned above, this is coupled to discovery's link-timeout parameter (it needn't be so strongly coupled as it is, though). IIRC, the cycle time is about half of the timeout time, so if the timeout time is 10 seconds (the default, I think), then discovery won't have finished until some time after 5 seconds... before that, packet forwarding will likely not work.
  2. Reactive control bottleneck. If you are using an L2 "learning switch" type controller like l2_pairs, say, 1000 nodes in a tree, then the first time a host tries to open a TCP connection to another host X hops away, it will cause 1000 packet-in messages (for the ARP), and then X more packet-ins for the ARP reply. If many hosts are trying to start communicating at the exact same time, it may be considerably more than this. And each of these needs to get sent to the controller, the controller needs to process it, and the controller will probably send back a flow-mod, which then needs to be sent to and processed by the switch. This may bottleneck at several places. The controller's CPU performance may be one such place. There are things one can do to improve the situation in POX (e.g., use the --unthreaded-sh option, and use PyPy instead of stock Python), but that just changes where things break, not that they break. Exact-match controllers like l2_learning will do even worse because the installed table entries won't be reused across connections. There are a number of things one can do to help resolve this, which generally involve not doing things reactively or at least not doing them as reactively. Or at least not doing them reactively at the controller. POX has a few examples which change things up in this regard (though the assumption is generally that people will write controllers using POX to fit their exact needs, not that POX ships with best-for-everybody turnkey solutions). l2_nx_self_learning, for example, does learning within the switch using Nicira/OVS extensions. topo_proactive uses topologically meaningful addresses to avoid reactive learning altogether (e.g., much more like traditional routing).

If the problem is (1), it's just a matter of waiting a bit to start your test.

If the problem is (2), you could do a number of things. For example, try to improve the performance (e.g., using --unthreaded-sh), or use a different forwarding/routing approach (several possibilities included with POX, or write your own... there are many "better" ways to do it than POX includes, particularly if you're willing to use stuff beyond plain OpenFlow 1.0), or you could "cheat" a bit. This last one can be pretty effective. The idea is to use learning, but to "prime" learning before actually using it. e.g., use l2_pairs, and before you start your real test, you have every host send a single broadcast ping packet or something (perhaps with a small delay between each one). This will give each switch a chance to learn each host so that when the real experiment begins, all the tables are ready. What solution is best depends on what you're trying to do.

ppershing commented 8 years ago

I too noticed that POX has some troubles with fattree topos because of loops/discovery timeouts. We worked around this in our experiments by using a custom spanning tree "discovery" which knew exactly the topology from the start and pre-configured the spanning tree accordingly.

MurphyMc commented 8 years ago

Yeah. I actually think the "automatic" discovery isn't the right solution under many circumstances and that giving the controller the topology ahead of time is the right approach. (I believe this for operational networks as well.) For that matter, a spanning tree isn't really the right solution for networks with path diversity. It's all just the "easy" way in some sense.

That said, there have been a number of bugs/problems with POX discovery which have gotten fixed over time (including the push I just made!). I've also made a number of modifications which have never gotten pushed which can be improvements in some scenarios. For example, if you're doing experiments that don't involve failures, disabling link timeouts can be useful: discovery is used to find the topology, but then you don't need to worry about links timing out due to overload. Similarly, you can set the link timeout to be much longer (while keeping the LLDP cycle time the same), and use port events for removing links (with the assumption that link failures are detectable this way; valid in some but not all cases).

(I rarely use any of this stuff anymore, but when I do, it's almost always a combination of loading the topology from a file and then using port events and slow LLDP liveness probes via a modified discovery. The Mininet-like-thing I use can actually read the same topology files I use in the controller, which makes this pretty convenient.)

lordlabakdas commented 7 years ago

Hello, I get a similar error while on a latest pulled eel and with a topology with 75 switches and 2 hosts and quite a bit of loops. I have used the command ./pox.py forwarding.l2_learning openflow.discovery --eat-early-packets openflow.spanning_tree --no-flood --hold-down The gist of the log can be found here I have a topology of 75 switches with quit a bit of loops. As a pre-step to experiments, I am trying to do a net.pingAll which would eventually do a ping between the 2 hosts involved. The ping does not go through and I get the same error as is reported earlier here. treeping64.py mininet example pings work without issues.

MurphyMc commented 7 years ago

One thing the POX log doesn't show by default which would be helpful is the timestamp.

It looks like discovery is timing out links that it shouldn't, but it's hard to say when this happens and that doesn't help trying to understand why.

Some thoughts:

lordlabakdas commented 7 years ago

My observations:

With the above changes, I still get the same behavior as I used to.

MurphyMc commented 7 years ago

Thanks for following up. Yes, your connection up handler looks fine (though note that an easier way would be not mentioning the match at all -- it defaults to all wildcards).

time.sleep(20) is not going to be sufficient given the entry you installed in your connection up handler. As a conservative test, try like... time.sleep(70).

And with the new changes, can you try capturing the log again? In particular, we are interested to see if and when discovery is still timing out links. It would be helpful to include timestamps. Here's a starting place for a log format based on an example in the POX manual:

log --format="[%(asctime)s %(name)s %(levelname)s] %(message)s" --datefmt="%H:%M:%S"

Also, is your topology easy to scale? If so, is there some size at which it works fine and some size at which it starts to fail? And can you describe your topology (so that one might attempt to recreate your failing experiment)?

lordlabakdas commented 7 years ago

The updated log can be found here. The command used was ./pox.py forwarding.l2_learning openflow.discovery --eat-early-packets openflow.spanning_tree --no-flood --hold-down ext.coronet log --format="[%(asctime)s] %(message)s" --datefmt="%H:%M:%S" My topology represents that of an ISP in the US called Coronet and looks like the below: image The JSON file that possesses the adjacency matrix and the Mininet script that I use can be found here I will try to work with getting results for scaling the nodes and get back to you.

lordlabakdas commented 7 years ago

Some results regarding scaling of nodes here: 10 nodes - 0 pings 20 nodes - pings happened 25 nodes - 0 pings 30 nodes - 0 pings

MurphyMc commented 7 years ago

My understanding is that you aren't intentionally having a bunch of dns queries or anything at this point, but looking at the log, there are quite a few. It quickly balloons from 25 per second right when the "quiet" table entry expires to over 4,000 eight seconds later: 25 [14:12:21] 386 [14:12:22] 480 [14:12:23] 749 [14:12:24] 667 [14:12:25] 1045 [14:12:26] 1888 [14:12:27] 3430 [14:12:28] 4311 [14:12:29]

It seems very likely that these are replicated duplicates. The most logical explanation is that a proper spanning tree is not being established. The likely reasons would seem to be 1) failure to discover the topology properly, 2) an algorithmic problem computing the spanning tree or, 3) a problem "executing" the tree (that is, something like a problem disabling the appropriate ports).

I'm guessing it's not 1, but this should be easily verifiable by looking at the link detect messages in the beginning of the log and comparing against the adjacency matrix.

2 would be easier to evaluate if we had more info. Line 108 in spanning_tree.py has an "if" statement that always evaluates to False. Can you flip it to True and set the POX log level to DEBUG and rerun the experiment?

lordlabakdas commented 7 years ago

Reran the case after flipping the if statement to True and the log can be found here

lordlabakdas commented 7 years ago

Also, the link detect messages seem to match the number of switches, so it might not be 1)

MurphyMc commented 7 years ago

A couple comments...

The beginning of the most recent log file is missing the beginning, which isn't ideal, since I don't think all the link discoveries are in it. Do you have the beginning?

This is relatively minor, but the log component invocation I gave includes more data than the one in the POX manual: log --format="[%(asctime)s %(name)s %(levelname)s] %(message)s" --datefmt="%H:%M:%S"

I haven't tested it, but I think that works. Not a huge deal, but slightly more convenient for searching.

lordlabakdas commented 7 years ago

Yes, I have the begining and updated the gist with the log. Working on getting the log with the new format.

MurphyMc commented 7 years ago

Are you sure the topology is correct? There seem to be many more links than the figure you posted above would indicate.

lordlabakdas commented 7 years ago

The log with the new format using command ./pox.py forwarding.l2_learning openflow.discovery --eat-early-packets openflow.spanning_tree --no-flood --hold-down ext.coronet log --format="[%(asctime)s %(name)s %(levelname)s] %(message)s" --datefmt="%H:%M:%S" can be found here

lordlabakdas commented 7 years ago

I think the topology is correct, a total of 75 switches (and 2 hosts) and 198 links.

MurphyMc commented 7 years ago

I think there are actually twice that many (unidirectional) links. I think maybe you're wiring them up double. For example: 00-00-00-00-00-71.1 00-00-00-00-00-22.6 00-00-00-00-00-71.3 00-00-00-00-00-22.8

And: 00-00-00-00-00-71.2 00-00-00-00-00-48.4 00-00-00-00-00-71.4 00-00-00-00-00-48.6

I actually have no idea if the existing spanning_tree component deals with this case, but it wouldn't surprise me if it didn't.

(Wiring them up double like that is easy to do if you're iterating the neighbors on each node. An easy way to fix it is often to skip the link if node2 < node1.)

lordlabakdas commented 7 years ago

Sorry my mistake, there should only be 198 unidirectional links. I had considered the whole adj matrix when adding links. Made the necessary change only considering the upper triangular part of the adj matrix and the new log is provided here

MurphyMc commented 7 years ago

It looks like we lost the DEBUG level in there, so we don't get to see the tree...

lordlabakdas commented 7 years ago

Oops, sorry again. Included the DEBUG log.level and the updated log is here

MurphyMc commented 7 years ago

So the topology that's discovered is isomorphic with the original coronet graph. I think possibility 1 is ruled out.

The tree it comes up with does seem to be a tree on the graph... but it's incomplete. Offhand, this seems like it should cause fewer loops and not more and you seem to be having the problem of more, but either way, this looks like it may be a bug.

Can you try commenting out lines 159-164 in spanning_tree? You want it to force a tree update on every link event. The discovery rate is currently set low enough that this won't be a problem.

MurphyMc commented 7 years ago

Incidentally, in the grand tradition of such things, I just tried this and it seemed to work just fine. Whether that's because I'm using a slightly newer POX (now pushed), because I'm not using Mininet, or just because that is always the way of such things, I couldn't possibly guess. :)

lordlabakdas commented 7 years ago

I tried commenting out lines 159-164 of spanning_tree.py and I am still dont see pings going through. The log is provided here

ppershing commented 7 years ago

Fyi We always had problems with discovery on more than let's say 10 switches. Somehow something always times out or a packet is lost or just pox is slow enough to handle this. Our usual solution is to include own discovery component which knows exactly the topology (read it from the same file as mininet) Peter

-----Original Message----- From: "Siddharth Gangadhar" notifications@github.com Sent: ‎09/‎04/‎2017 19:10 To: "noxrepo/pox" pox@noreply.github.com Cc: "Peter Peresini" ppershing+github@gmail.com; "Comment" comment@noreply.github.com Subject: Re: [noxrepo/pox] Pox error and packet loss when running openflowdiscovery and spanning tree on large fat tree topologies (#139)

Hello, I get a similar error while on a latest pulled eel and with a topology with 75 switches and 2 hosts and quite a bit of loops. I have used the command ./pox.py forwarding.l2_learning openflow.discovery --eat-early-packets openflow.spanning_tree --no-flood --hold-down The gist of the log can be found here I have a topology of 75 switches with quit a bit of loops. As a pre-step to experiments, I am trying to do a net.pingAll which would eventually do a ping between the 2 hosts involved. The ping does not go through and I get the same error as is reported earlier here. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

lordlabakdas commented 7 years ago

@ppershing Would you be willing to provide an example of your custom discovery script for me to look at?

MurphyMc commented 7 years ago

@ppershing: I think that's the right approach anyway when possible.

Though the log from @lordlabakdas looked like maybe there's a legit bug. Unfortunately, recreating it locally wasn't as easy as recreating the topology... 75 switches no problem here.

lordlabakdas commented 7 years ago

Some good news :) I am not sure what changed from the log I posted yesterday to now (other than a reboot), but I am able to ping successfully between the couple of hosts.

MurphyMc commented 7 years ago

Well that's a start, at least!

Is your CPU usage super high (indicating a loop)?