kytos-ng / of_core

Kytos Main OpenFlow Network Application (NApp)
MIT License
0 stars 5 forks source link

Kytos becomes unresponsive depending on the DPID of a switch with links #102

Closed italovalcy closed 1 year ago

italovalcy commented 1 year ago

Hi,

I'm using Kytos through the amlight/kytos:latest docker image and when using a certain switch's DPID, Kytos becomes unresponsive. The switches start reconnecting and all components get affected.

In the container, the following Napps are enabled:

root@29f740abc748:/# kytos napps list
Status |           NApp ID            |                                                                Description
=======+==============================+===========================================================================================================================================
 [ie]  | amlight/coloring:2022.3.0    | NApp to color a network topology
 [ie]  | amlight/flow_stats:2022.3.0  | Store flow information and show statistics about them.
 [ie]  | amlight/sdntrace:2022.3.0    | An OpenFlow Data Path Tracing
 [ie]  | amlight/sdntrace_cp:2022.3.0 | Run tracepaths on OpenFlow in the Control Plane
 [ie]  | kytos/flow_manager:2022.3.0  | Manage switches' flows through a REST API.
 [ie]  | kytos/maintenance:2022.3.0   | This NApp creates maintenance windows, allowing the maintenance of network devices (switch, link, and interface) without receiving alerts.
 [ie]  | kytos/mef_eline:2022.3.0     | NApp to provision circuits from user request
 [ie]  | kytos/of_core:2022.3.0       | OpenFlow Core of Kytos Controller, responsible for main OpenFlow operations.
 [ie]  | kytos/of_lldp:2022.3.0       | Discover network-to-network interfaces (NNIs) using the LLDP protocol.
 [ie]  | kytos/pathfinder:2022.3.0    | Keeps track of topology changes, and calculates the best path between two points.
 [ie]  | kytos/topology:2022.3.0      | Manage the network topology.

Status: (i)nstalled, (e)nabled

How to reproduce:

  1. Using the amlight/kytos docker image, create a topology with the following python code:
    
    # cat mytopo.py
    #!/usr/bin/python

""" This example creates a multi-controller network from semi-scratch by using the net.add*() API and manually starting the switches and controllers. """

import os

from mininet.clean import Cleanup # from mininet.net import Mininet from mininet.node import RemoteController from mininet.cli import CLI from mininet.log import setLogLevel

ip = os.environ.get('CONTROLLER_IP', '127.0.0.1') port = int(os.environ.get('CONTROLLER_PORT', '6653'))

def single_domain(): """ Create AmLight network for tests """

net = Mininet(topo=None, build=False)

# Add switches
ampath1 = net.addSwitch('s1', listenPort=6601, dpid="0024389406000000")
sol2 =    net.addSwitch('s3', listenPort=6603, dpid="cc4e244b11000000")

# Add links
net.addLink(ampath1, sol2, port1=53, port2=52)  # 2/5 to 2/4

# Add hosts
h1 = net.addHost('h1', mac='00:00:00:00:00:01')
h3 = net.addHost('h3', mac='00:00:00:00:00:03')

# Add links to switches
net.addLink(h1, ampath1, port1=1, port2=1)
net.addLink(h3, sol2, port1=1, port2=1)

net.addController('ctrl', controller=RemoteController, ip=ip, port=port)

net.build()
net.start()
CLI(net)
net.stop()

if name == 'main': setLogLevel('info') # for CLI output Cleanup.cleanup() single_domain()

2. Start kytos and Mininet:

tmux new-sess -d -s mn python3 /mytopo.py tmux new-sess -d -s k1 "kytosd -f"

3. Observe the system logs (/var/log/syslog) and when you see two switches with HANDSHAKE COMPLETED, enable them:

for sw in $(curl -s http://127.0.0.1:8181/api/kytos/topology/v3/switches | jq -r '.switches[].id'); do curl -H 'Content-type: application/json' -X POST http://127.0.0.1:8181/api/kytos/topology/v3/switches/$sw/enable; curl -H 'Content-type: application/json' -X POST http://127.0.0.1:8181/api/kytos/topology/v3/interfaces/switch/$sw/enable; done


Expected behavior: Kytos should enable the switches and discovery the links normally 

Actual behavior: kytos becomes unresponsive and switches start reconnecting. Here is the observed logs:

Feb 16 20:19:11 29f740abc748 kytos.core.atcp_server:INFO atcp_server:131: New connection from 127.0.0.1:46474 Feb 16 20:19:11 29f740abc748 kytos.core.atcp_server:INFO atcp_server:131: New connection from 127.0.0.1:46476 Feb 16 20:19:11 29f740abc748 kytos.napps.kytos/of_core:INFO main:140: Connection ('127.0.0.1', 46474), Switch 00:24:38:94:06:00:00:00: OPENFLOW HANDSHAKE COMPLETE Feb 16 20:19:11 29f740abc748 kytos.napps.kytos/of_core:INFO main:140: Connection ('127.0.0.1', 46476), Switch cc:4e:24:4b:11:00:00:00: OPENFLOW HANDSHAKE COMPLETE Feb 16 20:19:27 29f740abc748 kytos.napps.kytos/flow_manager:INFO main:600: Send FlowMod from request dpid: 00:24:38:94:06:00:00:00, command: add, force: False, flows_dict: {'flows': [{'priority': 1000, 'table_id': 0, 'cookie': 12321910788892655616, 'match': {'dl_type': 35020, 'dl_vlan': 3799}, 'actions': [{'action_type': 'output', 'port': 4294967293}]}]} Feb 16 20:19:27 29f740abc748 kytos.napps.kytos/flow_manager:INFO main:600: Send FlowMod from request dpid: cc:4e:24:4b:11:00:00:00, command: add, force: False, flows_dict: {'flows': [{'priority': 1000, 'table_id': 0, 'cookie': 12321888485312036864, 'match': {'dl_type': 35020, 'dl_vlan': 3799}, 'actions': [{'action_type': 'output', 'port': 4294967293}]}]} Feb 16 20:19:27 29f740abc748 kytos.napps.kytos/flow_manager:INFO main:600: Send FlowMod from request dpid: 00:24:38:94:06:00:00:00, command: add, force: True, flows_dict: {'flows': [{'table_id': 0, 'match': {'dl_src': '24:4b:11:ee:ee:ee'}, 'priority': 50000, 'actions': [{'action_type': 'output', 'port': 4294967293}], 'cookie': 12404101482092167168}], 'force': True} Feb 16 20:19:27 29f740abc748 kytos.napps.kytos/flow_manager:INFO main:600: Send FlowMod from request dpid: cc:4e:24:4b:11:00:00:00, command: add, force: True, flows_dict: {'flows': [{'table_id': 0, 'match': {'dl_src': '38:94:06:ee:ee:ee'}, 'priority': 50000, 'actions': [{'action_type': 'output', 'port': 4294967293}], 'cookie': 27115650311270694912}], 'force': True}

Feb 16 20:19:37 29f740abc748 ovs-vswitchd: ovs|00958|rconn|ERR|s3<->tcp:127.0.0.1:6653: no response to inactivity probe after 5 seconds, disconnecting Feb 16 20:19:37 29f740abc748 kytos.core.atcp_server:INFO atcp_server:182: Connection lost with client 127.0.0.1:46476. Reason: Request closed by client Feb 16 20:19:37 29f740abc748 ovs-vswitchd: ovs|00959|rconn|ERR|s1<->tcp:127.0.0.1:6653: no response to inactivity probe after 5 seconds, disconnecting Feb 16 20:19:37 29f740abc748 kytos.core.atcp_server:INFO atcp_server:182: Connection lost with client 127.0.0.1:46474. Reason: Request closed by client Feb 16 20:19:38 29f740abc748 kytos.core.atcp_server:INFO atcp_server:131: New connection from 127.0.0.1:46498 Feb 16 20:19:38 29f740abc748 kytos.core.atcp_server:INFO atcp_server:131: New connection from 127.0.0.1:46500 Feb 16 20:19:40 29f740abc748 kytos.core.atcp_server:INFO atcp_server:182: Connection lost with client 127.0.0.1:46498. Reason: Request closed by client Feb 16 20:19:40 29f740abc748 kytos.core.atcp_server:INFO atcp_server:182: Connection lost with client 127.0.0.1:46500. Reason: Request closed by client Feb 16 20:19:42 29f740abc748 kytos.core.atcp_server:INFO atcp_server:131: New connection from 127.0.0.1:46502 Feb 16 20:19:42 29f740abc748 kytos.core.atcp_server:INFO atcp_server:131: New connection from 127.0.0.1:46504 Feb 16 20:19:44 29f740abc748 kytos.core.atcp_server:INFO atcp_server:182: Connection lost with client 127.0.0.1:46502. Reason: Request closed by client Feb 16 20:19:44 29f740abc748 kytos.core.atcp_server:INFO atcp_server:182: Connection lost with client 127.0.0.1:46504. Reason: Request closed by client

viniarck commented 1 year ago

Thanks for catching and reporting this, @italovalcy.

I'll start to analyze this issue and fix it.

viniarck commented 1 year ago

It turns out that coloring could overflow FlowMod's cookie value. This was resulting in a silent unhandled exception on msg_out_event_handler coroutine, that's why it became unresponsive for msg_out for a given dpid connection, although it's just a validation issue, core queues couroutine handlers are critical they can't crash for such cases.

2023-02-17 09:52:42,976 - INFO [kytos.napps.kytos/flow_manager] [main.py:601:_send_flow_mods_from_request] (Thread-96) Send FlowMod from request dpid: cc:4e:24:4b:11:00:00:00, command: 
add, force: True, flows_dict: {'flows': [{'table_id': 0, 'match': {'dl_src': '38:94:06:ee:ee:ee'}, 'priority': 50000, 'actions': [{'action_type': 'output', 'port': 4294967293}], 'cookie
': 27115650311270694912}], 'force': True}

2023-02-17 09:52:43,000 - ERROR [kytos.core.controller] [controller.py:629:msg_out_event_handler] (MainThread) FlowMod.cookie - Expected UBInt64, found value "27115650311270694912"
 of type int

kytos $> 27115650311270694912 > 2 ** 64                                                                                                                                                  
Out[1]: True

kytos $> 27115650311270694912 > 2 ** 65                                                                                                                                                  
Out[2]: False

I'll send PRs to coloring, kytos and also flow_manager to have a minimal FlowMod pack validation until we also have in the next version 2023.1 better validation.

viniarck commented 1 year ago

All related PRs have landed, I'll go ahead and close this issue.