noxrepo / pox

The POX network software platform
https://noxrepo.github.io/pox-doc/html/
Apache License 2.0
617 stars 470 forks source link

Pingall and Ping command failure when running ip_loadbalancer component even after upgrading to latest branch (in my case halosaur) #297

Open paulpal opened 3 months ago

paulpal commented 3 months ago

mininet@mininet-vm:~$ cd pox mininet@mininet-vm:~/pox$ git checkout gar error: pathspec 'gar' did not match any file(s) known to git mininet@mininet-vm:~/pox$ cd mininet@mininet-vm:~/pox$ git checkout gar -bash: cd: too many arguments mininet@mininet-vm:~/pox$ error: pathspec 'gar' did not match any file(s) known to git -bash: syntax error near unexpected token `(' mininet@mininet-vm:~/pox$ git fetch remote: Enumerating objects: 283, done. remote: Counting objects: 100% (283/283), done. remote: Compressing objects: 100% (134/134), done. remote: Total 283 (delta 174), reused 242 (delta 144), pack-reused 0 Receiving objects: 100% (283/283), 184.14 KiB | 1.11 MiB/s, done. Resolving deltas: 100% (174/174), done. From https://github.com/noxrepo/pox

mininet@mininet-vm:~$ sudo mn --topo single,6 --controller=remote,port=6633 Creating network Adding controller Adding hosts: h1 h2 h3 h4 h5 h6 Adding switches: s1 Adding links: (h1, s1) (h2, s1) (h3, s1) (h4, s1) (h5, s1) (h6, s1) Configuring hosts h1 h2 h3 h4 h5 h6 Starting controller c0 Starting 1 switches s1 ... Starting CLI: mininet> pingall Ping: testing ping reachability h1 -> X X X X X h2 -> X X X X X h3 -> X X X X X h4 -> X X X X X h5 -> X X X X X h6 -> X X X X X *** Results: 100% dropped (0/30 received) mininet>

I have tried all things possible to run mininet in different topologies with the remote pox controller acting as loadbalancer without much success. I have even uppgraded to latest pox branch in my case halosaur as a result of a similar thread recommendation I saw from on here but still nothing seems to work. Other pox components such as learning switch, hub, etc are working just fine with any topology I create. Also the same case running mininet with the default controller ie sudo mn or sudo mn --topo single, 4 yields the same results with successful pings and pingalls. The only challenge arises when I run the pox load balancer component as well as any modified python load balancer modules I create from this default load balancer. I would really appreciate any help with how to resolve this issue since the pox controller running in the second terminal seems to abort as soon as I run the pingall test from mininet terminal. Thanks in advance.

paulpal commented 3 months ago

Hello again everybody, I've gone ahead and used the extended loadbalancer from this contributor (https://github.com/VamsikrishnaNallabothu/My_POX_SDN_Work/blob/master/ip_loadbalancer.py) which is based on Murphy's origianl pox sdn version and run it the way he recommended but still got the same error of abortion.

mininet@mininet-vm:~$ sudo ~/pox/pox.py log.level --DEBUG misc.ip_loadbalancer2 --ip=10.0.1.1 --servers=10.0.0.1,10.0.0.2,10.0.0.3 POX 0.8.0 (halosaur) / Copyright 2011-2022 James McCauley, et al. DEBUG:core:POX 0.8.0 (halosaur) going up... DEBUG:core:Running on CPython (3.8.5/Jul 28 2020 12:59:40) DEBUG:core:Platform is Linux-5.4.0-42-generic-x86_64-with-glibc2.29 DEBUG:openflow.of_01:Listening on 0.0.0.0:6633 INFO:core:POX 0.8.0 (halosaur) is up. INFO:openflow.of_01:[00-00-00-00-00-01 2] connected INFO:iplb:IP Load Balancer Ready. INFO:iplb:Load Balancing on [00-00-00-00-00-01 2] INFO:iplb.00-00-00-00-00-01:Server 10.0.0.1 up INFO:iplb.00-00-00-00-00-01:Server 10.0.0.2 up INFO:iplb.00-00-00-00-00-01:Server 10.0.0.3 up DEBUG:openflow.of_01:1 connection aborted

mininet@mininet-vm:~$ sudo mn --topo single,6 --controller=remote,port=6633 Creating network Adding controller Adding hosts: h1 h2 h3 h4 h5 h6 Adding switches: s1 Adding links: (h1, s1) (h2, s1) (h3, s1) (h4, s1) (h5, s1) (h6, s1) Configuring hosts h1 h2 h3 h4 h5 h6 Starting controller c0 Starting 1 switches s1 ... Starting CLI: mininet> pingall Ping: testing ping reachability h1 -> X X X X X h2 -> X X X X X h3 -> X X X X X h4 -> X X X X X h5 -> X X X X X h6 -> X X X X X *** Results: 100% dropped (0/30 received) mininet>

Kindly guys, I really need your input on how to resolve this bug since if I can't ping the hosts resulting in the controller-switch connection aborting, I can't move ahead with the rest of my project and it's kinda urgent by now. Specifically, can someone tell me why the switch disconnects from the remote controller when pingall or individual ping commands are issued for my topologies. Also, how do I specify the argument in the mininet CLI topology [--dpid=] commad as the contributor indicated in the code?

paulpal commented 3 months ago

Is there anything else I'm supposed to do to get the hosts to ping each other when running the remote controller as a loadbalancer in mininet? What other pox components go hand in hand with the loadbalancer component to implement this effect, if any? Anyone who faced a similar challenge in the past and resolved it? Thanks in advance.

MurphyMc commented 3 months ago

According to the manual, ip_loadbalancer is a TCP load balancer. I don't think it does anything with pings (since they're not TCP). It's not clear to me what it should do or why, generally speaking. A load balancer is usually used to provide a "virtual" server host which sort of combines the actual server hosts. Why would you ping the actual servers? In any case, that sounds more like normal forwarding than load balancing, so it doesn't seem like the kind of the ip_loadbalancer -- specifically a demonstration of how one could implement some load balancing functionality -- should do. It seems like something someone would add to support their specific use case, whatever that may be. Perhaps just by cleverly combining ip_loadbalancer and one of the various learning switch components, for example.

You say, "if I can't ping the hosts resulting in the controller-switch connection aborting". Are you sure that not being able to ping the hosts results in the controller-switch connection aborting? And a loss of communication between the controller and switch? So if you don't try pinging, you don't get that error message? And if you run a different component, like a learning switch component, you don't get those connection aborted messages? I'm sort of skeptical about all this. For a number of reasons, but perhaps chief among them is that I think the connection aborted message is made when a TCP connection is established to the controller, but no OpenFlow handshake is completed. But we know OpenFlow handshaking has completed talking with a switch because we see it in the log, e.g., as "INFO:openflow.of_01:[00-00-00-00-00-01 2] connected". And that message shows a DPID (00-00-00-00-00-01) for the switch. POX learns about the switch's DPID during the handshaking. This connection was not aborted. I suspect the connection aborted message is not related to your problem.

As to how to specify the DPID to the load balancer via the --dpid parameter... it appears that Mininet still sets the DPIDs of the switches such that s1 is DPID 00-00-00-00-00-01. I think you can probably do --dpid=00-00-00-00-00-01 on ip_loadbalancer's command line. You can find more about how DPIDs are specified in POX in the manual. It looks like your test topology only has one switch, so it may not actually be important to set it, though -- as the manual says, it uses the first switch to connect by default.

Good luck.

paulpal commented 3 months ago

Hi Murphy,

Thanks so much for your feedback. I have posted this pox ip_loadbalancer issue with different outcome scenarios on this repo as you have probably noticed by now. You are right the connection aborting debug message on the controller happens after some seconds following switch connection regardless of the ping/pingall or any other set of commands sent out on the mininet CLI and is obviously another problem altogether different from the ping failure that I thought it was initially. You'll see this on my other threads I posted here afterwards ( I know I shouldn't have done that but I posted in one thread that was already closed and started a new one when I realized I had a different bug scenario based on different single switch topologies than I initially presented here). Also, the aborting scenario happens with both pox gar and halosaur. Also before I switched to the loadbalancer component I did not see any connection aborted messages when I first ran pox with the other components such as learning switch or flooding hub, etc. So I wonder why the connection abortion happens with the loadbalancer and if there is a workaround it especially given the gravity of it based on my case scenario elaborated below.

I came across one Chinese/English webpage that recommended setting the OpenFlow protocols to 10 as that scenario had been using a later version 13 but which openflow's default 10 wasn't able to parse. In my case however, I used the normal version 10 but it still aborted anyways. This is serious and bugging me a lot since my loadbalancer scenario revolves around having the switch establishing persistent connection with the hosts(clients and servers) via the controller for performance testing using different test algorithms (which I have prepared) including the default one in the ip_loadbalancer during the session runtime. How am I supposed to carry out these loadbalancing performance testing if the pox controller using ip_loadbalancer component (whether with my own modified versions of the algorithm or with the default random algorithm keeps aborting after a momentary connection).

I saw that another user had a somewhat similar problem in this repo where you tried to point them to the existence of an ancient bug that was causing the abortion to happen but even then you were not sure if it still lingered on or if it was due to another cause. As far as the tcp loadbalancer is concerned I also recently came across your thread (pox dev) from 2014 where another user faced a similar challenge with pings and you correctly pointed that because it was tcp loadbalancer by default it did not allow ICMP pings to happen but could be easily implemented though you advised against that versus using the loadbalancer as it shipped. Is there a way around this today? How would you ago about making the loadbalancer pingable with ICMP traffic rather than just having the tcp/udp traffic that it ships with? Also if it still only uses tcp until now why do they still call it ip loadbalancer instead of tcp loadbalancer?

Much regards. Paulpal

MurphyMc commented 3 months ago

What makes you think the connection aborted messages reflect an actual problem?

As for why it's called ip_loadbalancer instead of tcp_loadbalancer... perhaps it's a bad name. Feel free to open an issue specifically about a name change, and it'll get considered. I imagine the thought is that it's doing the load balancing across several different real destination IP addresses. Perhaps the fact that it only loadbalanced TCP traffic didn't seem that noteworthy since many load balancers only handle TCP traffic, and putting it in the documentation seemed clear enough.

As for how I'd go about making the loadbalancer pingable... I don't know, because I don't know the purpose in doing so. One could make ICMP work based purely on the IP address easily enough, but this doesn't seem very useful. One could make it so that the controller itself responded to pings send to the virtual IP of the load balancer, but this also doesn't seem very useful. I don't know what is useful here, so I don't know what should be done.

paulpal commented 3 months ago

Well, I wouldn't know for sure if it reflects an actual problem or not until I carry out xterms and wireshark traffic captures of the clients pushing requests on the servers via the switch controller loadbalancer communication and how servers respond. I just don't know how that would be possible if the connection between the switch and the controller was actually aborted though. But from your standpoint, it could be unrelated to the switch since at least the connection was manifested once beforehand. I hope that's the case because it would then mean I could carry on with my experiments. But if the controller is disconnected from the switch on the connection aborted alert then it would mean a breakup of links in the topology technically rendering my load-balancing experiment impossible. I would try to follow your lead from the 2014 pox-dev page to the other user who had a similar case and see if the experiment works in my case if not I can tell whether or not the connection aborting presents a real problem and then follow up with feedback here.

For ip to tcp loadbalancer name confusion it would be great to consider a name change to avoid confusion, especially among pox/mininet beginners looking to use it the way it is or model their load balancers based on it. So yes I will open an issue to that effect. Thanks for the suggestion.

As for the last idea for tweaking the lb to work with the icmp, I would have to read up some more again on the Pox Wiki as well as on the ICMP/TCP/ARP networking stuff before I could ever understand how each piece fits neatly into the networking jigsaw puzzle. For now, I'll just use the load balancer as it is and see if it works if not then I will return to ask you for an alternative to pox for my particular load balancing scenario between Ryu and ODL or Open Day Light. I'm trying to test for different load-balancing algorithm performances for different SDN topologies that implement centralized and or decentralized controllers in a network using mininet. So for this, I'll need to implement topologies ranging between single switch, and single controller to multiple switches and multiple controllers sort of networks.