noxrepo / pox

The POX network software platform
https://noxrepo.github.io/pox-doc/html/
Apache License 2.0
619 stars 470 forks source link

Reduce TTL with l2_pairs.py #265

Closed mattall closed 2 years ago

mattall commented 3 years ago

Hello, I'm using POX in Mininet and I want to let hosts in my network discover the hops that they are taking across the switches in the network.

This might be an odd request but I was wondering if there is a way I can program the POX controller to instruct the switches to decrement the TTL of packets by modifying the l2_pairs.py program? I have played with the code and done some inspection and see that the l2_pairs.py is only handling ethernet packets so TTL is not available here. If I parse it here, then it looks like I'm messing with the packet's state (e.g., self.parsed ) and I'm afraid this might have unintended consequences. Perhaps I should be looking somewhere else in the code to add this change to my application.

I also thought about giving the switches different subnet addresses and using l3_learning to let them link up but I haven't been able to yet. Since I at least have connectivity via l2_pairs I was wondering if I could bake in the l3 behavior of decrementing TTL to l2_pairs.

Thanks for any pointers!

MurphyMc commented 3 years ago

I think you need more than decrementing the TTL, right? You also need the ICMP message when it expires, right?

Decrementing the TTL could be done relatively easily by modifying l2_pairs, I think. l2_pairs installs rules that just match the Ethernet header. You'll also want to install similar rules that match the Ethertype field to ensure that it's IP. On those rules, also add the TTL decrement action, which is a Nicira extension (but l2_pairs already uses Nicira extensions, so they must be working). You'll need the plain Ethernet rules too to handle ARP and such; you'll want to set the ones which also match the Ethertype as higher priority (or just install ARP and IP rules and forget about all the other Ethertypes?).

The trouble is that doing this won't automatically result in ICMP messages if I remember correctly. For that, you'll need to involve the controller. My memory has faded, but this probably means a third set of rules which also match TTL=0/1 and send those to the controller. The controller will then create an ICMP response and send it back to the switch with a packet-out action.

I think the POX ovs_rip component demonstrates decrementing the TTL and synthesizing ICMP responses... but does the latter for ping, not for TTL expiration, so it'll take a bit of adapting (this really should be a feature of ovs_rip, I think I just didn't bother).

An alternate approach would be to just provide an API/protocol by which hosts could query the controller instead of supporting traceroute-like discovery.

And I guess a meta-question is whether you want/need to be using OpenFlow/OVS at all. If you just want normal IP forwarding, you might just want to use a software IP router. Then traceroute would just work.

mattall commented 3 years ago

Thank you for this detailed feedback!

Your meta question is on point. My alternative is to install Quagga and configure it for a bunch of mininet hosts, but I think that to configure it abstractly for every possible topology might be more work than hacking something with POX. I could be wrong!

Thanks for bringing up ovs_rip. It looks like fm.actions.append(ovs.nx_action_dec_ttl()) gets called once within _init_ingress_table. I also see where it is constructing that ICMP response, if icmpp.type == pkt.ICMP.TYPE_ECHO_REQUEST:.

I suppose what I might do is to add a TTL_COOKIE and another case in def RIP_PACKET_COOKIE (self, event): that calls another method, e.g., def _do_ttl_dec_or_send_time_expired. On the other hand, maybe I can simply change _do_ping to check the TTL before sending the ECHO_REPLY, and construct the TIME_EXCEEDED packet instead.

One thing that's not clear to me is _init_ingress_table. It seems like this is handling a lot of table logic and getting refreshed by a few other methods. Should I leave this alone, and allow it to continue to decrement ttl as it does, or should I create a new set of instructions in here too?

Thanks again!! Matt

MurphyMc commented 3 years ago

You may be right that it'd be a pain to do in Mininet. I have a tool (Garnet) that I've never gotten around to open-sourcing that is probably quite a bit easier. One of these days...

My intention in mentioning ovs_rip was that you could steal some of the code in it to extend l2_pairs. But it sounds like maybe you're considering just switching to using it instead, which actually seems like a fine idea. You just need to do two things. The first is that you need to extend it to handle the time expired (which is potentially a mergeable improvement and I'll come back to in a moment). The other is that you need to assign prefixes to all the links. This is actually the only "hard" thing about doing it in Garnet, though there is an associated toolset that works with Garnet that would make it a bit easier. The basic approach I have used for this sort of thing is something like... take some starting IP (e.g., 10.0.0.0), then take the topology, and for every router-router link, just generate the next /2 and use it for that link (each of the routers get one IP on this /2). The routes don't necessarily aggregate in any useful way, but I've never cared. Then I just tack on my "real" prefixes at the edges (e.g., with ovs_rip's --static option). This is something you could script if you have your topology in an easy to consume form (indeed, the "associated toolset" I mentioned above is essentially a simple but flexible graph file format, a little framework for writing tools that work with that file format and NetworkX, and a bunch of little tools written with it -- for facilitating doing exactly this sort of thing and then loading the results into simulators, Garnet, etc.).

Since I wasn't initially suggesting you use ovs_rip directly, I didn't mention this, but yes, I think you've hit on a crucial point. Pings already go through the controller, so if your tracerouting stuff just used pings, you can just modify the existing ping handling in _do_ping. If you want TTL expiration to actually work right in general, you need to add a new case (your TTL_COOKIE case, which I agree seems like the right way to try to do it).

The former (_do_ping modification) is almost certainly easier (and might be necessary in any case?). _do_ping has got that initial check of a bunch of preconditions. The last one checks to see if the packet is destined to one of the router's IPs. Before this check, you'd probably want to do the TTL check and send back an error if it fails. In case it's helpful, here's some code to generate a time exceeded message. This may exist elsewhere in POX, though this particular code is stolen from a component which isn't in the repo (yet?). I guess to send it as a packet-out, you'll also want to wrap it in an EthAddr and packet-out, but the existing ping code should do that.

  oip = a pkt.ipv4 object with the original IP packet

  icmp = pkt.icmp()

  icmp.type = pkt.ICMP.TYPE_TIME_EXCEED
  icmp.code = 0

  ipp = pkt.ipv4()
  ipp.protocol = ipp.ICMP_PROTOCOL
  ipp.srcip = # one of the router IPs
  ipp.dstip = oip.srcip
  d = oip.pack()
  d = d[:oip.hl * 4 + 8]
  d = struct.pack("!HH", 0,0) + d #FIXME: MTU
  icmp.payload = d
  ipp.payload = icmp

I think you can leave _init_ingress_table alone entirely if you just focus on making time exceeded messages work for pings. If you want them to work in general, you probably need to tweak this. Offhand, I think this probably just needs to add a new TTL_COOKIE rule that matches TTL=1/0, and has higher priority than the current last rule that it installs for all IP packets. And I think this just needs to send the packet to the controller, like the PING_COOKIE rule does. I think if you make this rule higher priority than the PING_COOKIE rule too, you may be able to skip modifying _do_ping and let the new TTL_COOKIE-handling code handle all of it.