kytos-ng / of_core

Kytos Main OpenFlow Network Application (NApp)
MIT License
0 stars 5 forks source link

Possible race condition on PortStatus handler #118

Closed italovalcy closed 1 year ago

italovalcy commented 1 year ago

Hi,

There is a possible race condition on the PortStatus handler, which leads to an interface/link becoming DOWN forever after a number of flaps on some interfaces. Basically, if the controller receives multiple PortStatus report a port is UP and DOWN in a very short period of time (ex: more the 6 events within a second), it might lead the controller to not correctly set the interface status.

How to reproduce: using the Mininet topology for AmLight (https://github.com/kytos-ng/kytos-end-to-end-tests/blob/master/tests/helpers.py#LL14C2-L14C2), try to force multiple events of port UP and down within a second:

ip link set down Ampath1-eth1; ip link set up Ampath1-eth1;ip link set down Ampath1-eth1; ip link set up Ampath1-eth1

In a normal situation, the links should go down and UP:

kytos $> 2023-05-12 20:05:26,688 - INFO [kytos.napps.kytos/of_core] (MainThread) PortStatus modified interface 00:00:00:00:00:00:00:12:1 state 0
2023-05-12 20:05:26,693 - INFO [kytos.napps.kytos/of_core] (MainThread) PortStatus modified interface 00:00:00:00:00:00:00:12:1 state OFPPS_LINK_DOWN
2023-05-12 20:05:26,718 - INFO [kytos.napps.kytos/of_core] (MainThread) PortStatus modified interface 00:00:00:00:00:00:00:11:1 state 0
2023-05-12 20:05:26,729 - INFO [kytos.napps.kytos/of_core] (MainThread) PortStatus modified interface 00:00:00:00:00:00:00:11:1 state OFPPS_LINK_DOWN
2023-05-12 20:05:26,827 - INFO [kytos.napps.kytos/mef_eline] (thread_pool_app_20) Event handle_link_down Link(Interface('Ampath1-eth1', 1, Switch('00:00:00:00:00:00:00:11')), Interface('Ampath2-eth1', 1, Switch('00:00:00:00:00:00:00:12')))
2023-05-12 20:05:26,858 - INFO [kytos.napps.kytos/of_core] (MainThread) PortStatus modified interface 00:00:00:00:00:00:00:12:1 state OFPPS_LIVE
2023-05-12 20:05:26,865 - INFO [kytos.napps.kytos/of_core] (MainThread) PortStatus modified interface 00:00:00:00:00:00:00:12:1 state 0
2023-05-12 20:05:26,869 - INFO [kytos.napps.kytos/of_core] (MainThread) PortStatus modified interface 00:00:00:00:00:00:00:12:1 state OFPPS_LINK_DOWN
2023-05-12 20:05:26,876 - INFO [kytos.napps.kytos/of_core] (MainThread) PortStatus modified interface 00:00:00:00:00:00:00:11:1 state OFPPS_LIVE
2023-05-12 20:05:26,880 - INFO [kytos.napps.kytos/of_core] (MainThread) PortStatus modified interface 00:00:00:00:00:00:00:11:1 state 0
2023-05-12 20:05:26,884 - INFO [kytos.napps.kytos/of_core] (MainThread) PortStatus modified interface 00:00:00:00:00:00:00:11:1 state OFPPS_LINK_DOWN
2023-05-12 20:05:26,955 - INFO [kytos.napps.kytos/of_core] (MainThread) PortStatus modified interface 00:00:00:00:00:00:00:12:1 state OFPPS_LIVE
2023-05-12 20:05:26,989 - INFO [kytos.napps.kytos/of_core] (MainThread) PortStatus modified interface 00:00:00:00:00:00:00:11:1 state OFPPS_LIVE
2023-05-12 20:05:37,158 - INFO [kytos.napps.kytos/mef_eline] (thread_pool_app_18) Event handle_link_up Link(Interface('Ampath1-eth1', 1, Switch('00:00:00:00:00:00:00:11')), Interface('Ampath2-eth1', 1, Switch('00:00:00:00:00:00:00:12')))

However, if you keep executing the command above (ip link set...), you may eventually end up in a situation where the interface is never activated again and the link_up event is never triggered.

kytos $> 2023-05-12 20:08:41,182 - INFO [kytos.napps.kytos/of_core] (MainThread) PortStatus modified interface 00:00:00:00:00:00:00:12:1 state 0
2023-05-12 20:08:41,199 - INFO [kytos.napps.kytos/of_core] (MainThread) PortStatus modified interface 00:00:00:00:00:00:00:12:1 state OFPPS_LINK_DOWN
2023-05-12 20:08:41,276 - INFO [kytos.napps.kytos/of_core] (MainThread) PortStatus modified interface 00:00:00:00:00:00:00:11:1 state 0
2023-05-12 20:08:41,312 - INFO [kytos.napps.kytos/of_core] (MainThread) PortStatus modified interface 00:00:00:00:00:00:00:11:1 state OFPPS_LINK_DOWN
2023-05-12 20:08:41,327 - INFO [kytos.napps.kytos/of_core] (MainThread) PortStatus modified interface 00:00:00:00:00:00:00:11:1 state OFPPS_LIVE
2023-05-12 20:08:41,357 - INFO [kytos.napps.kytos/of_core] (MainThread) PortStatus modified interface 00:00:00:00:00:00:00:12:1 state OFPPS_LIVE
2023-05-12 20:08:41,366 - INFO [kytos.napps.kytos/mef_eline] (thread_pool_app_16) Event handle_link_down Link(Interface('Ampath1-eth1', 1, Switch('00:00:00:00:00:00:00:11')), Interface('Ampath2-eth1', 1, Switch('00:00:00:00:00:00:00:12')))
2023-05-12 20:08:41,399 - INFO [kytos.napps.kytos/of_core] (MainThread) PortStatus modified interface 00:00:00:00:00:00:00:12:1 state 0
2023-05-12 20:08:41,419 - INFO [kytos.napps.kytos/of_core] (MainThread) PortStatus modified interface 00:00:00:00:00:00:00:12:1 state OFPPS_LINK_DOWN
2023-05-12 20:08:41,453 - INFO [kytos.napps.kytos/of_core] (MainThread) PortStatus modified interface 00:00:00:00:00:00:00:12:1 state OFPPS_LIVE
2023-05-12 20:08:41,491 - INFO [kytos.napps.kytos/of_core] (MainThread) PortStatus modified interface 00:00:00:00:00:00:00:11:1 state 0
2023-05-12 20:08:41,529 - INFO [kytos.napps.kytos/of_core] (MainThread) PortStatus modified interface 00:00:00:00:00:00:00:11:1 state OFPPS_LINK_DOWN
2023-05-12 20:08:41,537 - INFO [kytos.napps.kytos/of_core] (MainThread) PortStatus modified interface 00:00:00:00:00:00:00:11:1 state OFPPS_LIVE
2023-05-12 20:08:41,583 - INFO [kytos.napps.kytos/mef_eline] (thread_pool_app_0) Event handle_link_down Link(Interface('Ampath1-eth1', 1, Switch('00:00:00:00:00:00:00:11')), Interface('Ampath2-eth1', 1, Switch('00:00:00:00:00:00:00:12')))
2023-05-12 20:08:41,590 - INFO [kytos.napps.kytos/mef_eline] (thread_pool_app_41) Event handle_link_down Link(Interface('Ampath1-eth1', 1, Switch('00:00:00:00:00:00:00:11')), Interface('Ampath2-eth1', 1, Switch('00:00:00:00:00:00:00:12')))
2023-05-12 20:08:41,913 - INFO [kytos.napps.kytos/mef_eline] (thread_pool_app_15) Event handle_link_down Link(Interface('Ampath1-eth1', 1, Switch('00:00:00:00:00:00:00:11')), Interface('Ampath2-eth1', 1, Switch('00:00:00:00:00:00:00:12')))
italovalcy commented 1 year ago

The following patch seems to improve a bit the situation:

root@1534638afadd:/src/kytos-of-core# git diff main.py
diff --git a/main.py b/main.py
index b8b8dff..07c2e13 100644
--- a/main.py
+++ b/main.py
@@ -2,6 +2,7 @@

 import asyncio
 import time
+import threading
 from collections import defaultdict

 from napps.kytos.of_core import settings
@@ -45,6 +46,7 @@ class Main(KytosNApp):
         self.of_core_version_utils = {0x04: of_core_v0x04_utils}
         self.execute_as_loop(settings.STATS_INTERVAL)
         self._connection_lock = defaultdict(asyncio.Lock)
+        self._port_status_lock = defaultdict(threading.Lock)

         # Per switch delay to request flow/port stats, to avoid all request
         # being sent together and increase the overhead on the controller
@@ -604,55 +606,58 @@ class Main(KytosNApp):
         port_no = port.port_no.value
         event_name = 'kytos/of_core.switch.interface.'

-        if reason == 'OFPPR_ADD':
-            status = 'created'
-            interface = Interface(name=port.name.value,
-                                  address=port.hw_addr.value,
-                                  port_number=port_no,
-                                  switch=source.switch,
-                                  state=port.state.value,
-                                  features=port.curr)
-            source.switch.update_interface(interface)
-            try_to_activate_interface(interface, port)
-
-        elif reason == 'OFPPR_MODIFY':
-            status = 'modified'
-            interface = source.switch.get_interface_by_port_no(port_no)
-            current_status = None
-            if interface:
-                current_status = interface.state
-                interface.state = port.state.value
-                interface.name = port.name.value
-                interface.address = port.hw_addr.value
-                interface.features = port.curr
-            else:
+        intfid = f"{source.switch.id}:{port_no}"
+
+        with self._port_status_lock[intfid]:
+            if reason == 'OFPPR_ADD':
+                status = 'created'
                 interface = Interface(name=port.name.value,
                                       address=port.hw_addr.value,
                                       port_number=port_no,
                                       switch=source.switch,
                                       state=port.state.value,
                                       features=port.curr)
-            source.switch.update_interface(interface)
-            try_to_activate_interface(interface, port)
-            self._send_specific_port_mod(port, interface, current_status)
-
-        elif reason == 'OFPPR_DELETE':
-            status = 'deleted'
-            interface = source.switch.get_interface_by_port_no(port_no)
-            interface.deactivate()
-
-        event_name += status
-        content = {'interface': interface}
-
-        event = KytosEvent(name=event_name, content=content)
-        self.controller.buffers.app.put(event)
+                source.switch.update_interface(interface)
+                try_to_activate_interface(interface, port)
+
+            elif reason == 'OFPPR_MODIFY':
+                status = 'modified'
+                interface = source.switch.get_interface_by_port_no(port_no)
+                current_status = None
+                if interface:
+                    current_status = interface.state
+                    interface.state = port.state.value
+                    interface.name = port.name.value
+                    interface.address = port.hw_addr.value
+                    interface.features = port.curr
+                else:
+                    interface = Interface(name=port.name.value,
+                                          address=port.hw_addr.value,
+                                          port_number=port_no,
+                                          switch=source.switch,
+                                          state=port.state.value,
+                                          features=port.curr)
+                source.switch.update_interface(interface)
+                try_to_activate_interface(interface, port)
+                self._send_specific_port_mod(port, interface, current_status)
+
+            elif reason == 'OFPPR_DELETE':
+                status = 'deleted'
+                interface = source.switch.get_interface_by_port_no(port_no)
+                interface.deactivate()
+
+            event_name += status
+            content = {'interface': interface}
+
+            event = KytosEvent(name=event_name, content=content)
+            self.controller.buffers.app.put(event)

-        # pylint: disable=protected-access
-        state_desc = {v: k for k, v in PortState._enum.items()}
-        # pylint: enable=protected-access
-        state = state_desc.get(port.state.value, port.state.value)
-        msg = 'PortStatus %s interface %s:%s state %s'
-        log.info(msg, status, source.switch.id, port_no, state)
+            # pylint: disable=protected-access
+            state_desc = {v: k for k, v in PortState._enum.items()}
+            # pylint: enable=protected-access
+            state = state_desc.get(port.state.value, port.state.value)
+            msg = 'PortStatus %s interface %s:%s state %s'
+            log.info(msg, status, source.switch.id, port_no, state)

 def _get_version_from_bitmask(message_versions):

However, we still have some corner cases where the state is inconsistent on Kytos (no link_up event and the stored status is different from the switch status):