kytos-ng / of_core

Kytos Main OpenFlow Network Application (NApp)
MIT License
0 stars 5 forks source link

Overlapping stats request when switch reconnects #123

Closed italovalcy closed 11 months ago

italovalcy commented 11 months ago

Hi,

When a switch disconnects in the middle of a FlowStats request, we will see Overalapping stats request messages being logged for a few cycles before it discards the old request and send a new one.

This can lead to a delayed Flow Consistency routine, as well as Kytos Stats delayed reports, among others. Another impact we are actually seeing is the End to End tests eventually failing in the following test case:

rerun: 0
tests/test_e2e_60_of_multi_table.py::TestE2EOfMultiTable::test_010_delete_miss_flow: 2023-09-18,07:08:29.768108 - 2023-09-18,07:08:54.694563
self = <tests.test_e2e_60_of_multi_table.TestE2EOfMultiTable object at 0x7f2759f5ad30>

   def test_010_delete_miss_flow(self):
       """Delete a miss flow so is recreated"""
       pipeline = {
           "multi_table": [
               {
                   "table_id": 0,
                   "description": "First table for miss flow entry",
                   "table_miss_flow": {
                       "priority": 0,
                       "instructions": [{
                           "instruction_type": "goto_table",
                           "table_id": 1
                       }],
                       "match": {"dl_vlan": 101}
                   },
               },
               {
                   "table_id": 1,
                   "description": "Second table for coloring",
                   "napps_table_groups": {
                       "coloring": ["base"],
                       "of_lldp": ["base"],
                       "mef_eline": ["evpl", "epl"],
                   },
               },
           ]
       }

       # Add pipeline
       api_url = f"{KYTOS_API}{OF_MULTI_TABLE_API}"
       response = requests.post(api_url, json=pipeline)
       data = response.json()
       assert response.status_code == 201, response.text
       assert 'id' in data

       # Enabled pipeline
       api_url = f"{KYTOS_API}{OF_MULTI_TABLE_API}/{data['id']}/enable"
       response = requests.post(api_url)
       assert response.status_code == 200, response.text
       time.sleep(10)

       s1 = self.net.net.get('s1')
       flows_s1 = s1.dpctl('dump-flows')
       assert 'actions=resubmit(,1)' in flows_s1

       # Delete all flows from switch 1
       s1.dpctl('del-flows')
       self.net.reconnect_switches()

       time.sleep(10)

       flows_s1 = s1.dpctl('dump-flows')
     assert 'actions=resubmit(,1)' in flows_s1
E       AssertionError: assert 'actions=resubmit(,1)' in ' cookie=0xab00000000000001, duration=13.738s, table=1, n_packets=0, n_bytes=0, priority=50000,dl_vlan=3799,dl_type=0x88cc actions=CONTROLLER:65535\r\n'

tests/test_e2e_60_of_multi_table.py:277: AssertionError

Looking into the Kytos logs we can see why this is happening:

Sep 18 07:08:40 runner-rmtk68xb-project-107-concurrent-0 2023-09-18 07:08:40,093 (MainThread) kytos.core.atcp_server:INFO atcp_server:182:  Connection lost with client 127.0.0.1:51402. Reason: Request closed by client
Sep 18 07:08:40 runner-rmtk68xb-project-107-concurrent-0 2023-09-18 07:08:40,151 (MainThread) kytos.core.atcp_server:INFO atcp_server:182:  Connection lost with client 127.0.0.1:51406. Reason: Request closed by client
Sep 18 07:08:40 runner-rmtk68xb-project-107-concurrent-0 2023-09-18 07:08:40,203 (MainThread) kytos.core.atcp_server:INFO atcp_server:182:  Connection lost with client 127.0.0.1:51404. Reason: Request closed by client
Sep 18 07:08:40 runner-rmtk68xb-project-107-concurrent-0 2023-09-18 07:08:40,253 (MainThread) kytos.core.atcp_server:INFO atcp_server:131:  New connection from 127.0.0.1:51436
Sep 18 07:08:40 runner-rmtk68xb-project-107-concurrent-0 2023-09-18 07:08:40,318 (MainThread) kytos.core.atcp_server:INFO atcp_server:131:  New connection from 127.0.0.1:51438
Sep 18 07:08:40 runner-rmtk68xb-project-107-concurrent-0 2023-09-18 07:08:40,366 (thread_pool_sb_0) kytos.napps.kytos/of_core:INFO main:158:  Connection ('127.0.0.1', 51436), Switch 00:00:00:00:00:00:00:01: OPENFLOW HANDSHAKE COMPLETE
Sep 18 07:08:40 runner-rmtk68xb-project-107-concurrent-0 2023-09-18 07:08:40,537 (thread_pool_sb_4) kytos.napps.kytos/of_core:INFO main:158:  Connection ('127.0.0.1', 51438), Switch 00:00:00:00:00:00:00:02: OPENFLOW HANDSHAKE COMPLETE
Sep 18 07:08:40 runner-rmtk68xb-project-107-concurrent-0 2023-09-18 07:08:40,649 (MainThread) kytos.core.atcp_server:INFO atcp_server:131:  New connection from 127.0.0.1:51440
Sep 18 07:08:40 runner-rmtk68xb-project-107-concurrent-0 2023-09-18 07:08:40,773 (thread_pool_sb_2) kytos.napps.kytos/of_core:INFO main:82:  Overlapping stats request: switch 00:00:00:00:00:00:00:01 flows_xid 3246955587 ports_xid 2515420549
Sep 18 07:08:40 runner-rmtk68xb-project-107-concurrent-0 2023-09-18 07:08:40,869 (AnyIO worker thread) kytos.napps.kytos/flow_manager:INFO utils:196:  Send FlowMod from request dpid: 00:00:00:00:00:00:00:01, command: add, force: False,  flows[0, 1]: [{'actions': [{'action_type': 'output', 'port': 4294967293}], 'table_id': 1, 'owner': 'of_lldp', 'table_group': 'base', 'priority': 50000, 'cookie': 12321848580485677057, 'match': {'dl_type': 35020, 'dl_vlan': 3799}}]
Sep 18 07:08:40 runner-rmtk68xb-project-107-concurrent-0 2023-09-18 07:08:40,931 (MainThread) uvicorn.access:INFO httptools_impl:506:  127.0.0.1:46862 - "POST /api/kytos/flow_manager/v2/flows/00%3A00%3A00%3A00%3A00%3A00%3A00%3A01 HTTP/1.1" 202
Sep 18 07:08:40 runner-rmtk68xb-project-107-concurrent-0 2023-09-18 07:08:40,922 (AnyIO worker thread) kytos.napps.kytos/flow_manager:INFO utils:196:  Send FlowMod from request dpid: 00:00:00:00:00:00:00:02, command: add, force: False,  flows[0, 1]: [{'actions': [{'action_type': 'output', 'port': 4294967293}], 'table_id': 1, 'owner': 'of_lldp', 'table_group': 'base', 'priority': 50000, 'cookie': 12321848580485677058, 'match': {'dl_type': 35020, 'dl_vlan': 3799}}]
Sep 18 07:08:40 runner-rmtk68xb-project-107-concurrent-0 2023-09-18 07:08:40,981 (MainThread) uvicorn.access:INFO httptools_impl:506:  127.0.0.1:46864 - "POST /api/kytos/flow_manager/v2/flows/00%3A00%3A00%3A00%3A00%3A00%3A00%3A02 HTTP/1.1" 202
Sep 18 07:08:41 runner-rmtk68xb-project-107-concurrent-0 2023-09-18 07:08:41,049 (thread_pool_sb_2) kytos.napps.kytos/of_core:INFO main:158:  Connection ('127.0.0.1', 51440), Switch 00:00:00:00:00:00:00:03: OPENFLOW HANDSHAKE COMPLETE
Sep 18 07:08:41 runner-rmtk68xb-project-107-concurrent-0 2023-09-18 07:08:41,177 (AnyIO worker thread) kytos.napps.kytos/flow_manager:INFO utils:196:  Send FlowMod from request dpid: 00:00:00:00:00:00:00:03, command: add, force: False,  flows[0, 1]: [{'actions': [{'action_type': 'output', 'port': 4294967293}], 'table_id': 1, 'owner': 'of_lldp', 'table_group': 'base', 'priority': 50000, 'cookie': 12321848580485677059, 'match': {'dl_type': 35020, 'dl_vlan': 3799}}]
Sep 18 07:08:41 runner-rmtk68xb-project-107-concurrent-0 2023-09-18 07:08:41,195 (MainThread) uvicorn.access:INFO httptools_impl:506:  127.0.0.1:46866 - "POST /api/kytos/flow_manager/v2/flows/00%3A00%3A00%3A00%3A00%3A00%3A00%3A03 HTTP/1.1" 202
Sep 18 07:08:41 runner-rmtk68xb-project-107-concurrent-0 2023-09-18 07:08:41,794 (thread_pool_sb_6) kytos.napps.kytos/of_core:INFO main:82:  Overlapping stats request: switch 00:00:00:00:00:00:00:03 flows_xid 2871009987 ports_xid 18022052
Sep 18 07:08:47 runner-rmtk68xb-project-107-concurrent-0 2023-09-18 07:08:47,294 (Thread-95) kytos.napps.kytos/of_core:INFO main:82:  Overlapping stats request: switch 00:00:00:00:00:00:00:01 flows_xid 3246955587 ports_xid 2515420549
Sep 18 07:08:47 runner-rmtk68xb-project-107-concurrent-0 2023-09-18 07:08:47,651 (Thread-96) kytos.napps.kytos/of_core:INFO main:82:  Overlapping stats request: switch 00:00:00:00:00:00:00:03 flows_xid 2871009987 ports_xid 18022052
Sep 18 07:08:54 runner-rmtk68xb-project-107-concurrent-0 2023-09-18 07:08:54,314 (Thread-98) kytos.napps.kytos/of_core:INFO main:82:  Overlapping stats request: switch 00:00:00:00:00:00:00:01 flows_xid 3246955587 ports_xid 2515420549
Sep 18 07:08:54 runner-rmtk68xb-project-107-concurrent-0 2023-09-18 07:08:54,671 (Thread-99) kytos.napps.kytos/of_core:INFO main:82:  Overlapping stats request: switch 00:00:00:00:00:00:00:03 flows_xid 2871009987 ports_xid 18022052
Sep 18 07:08:54 runner-rmtk68xb-project-107-concurrent-0 2023-09-18 07:08:54,859 (MainThread) uvicorn.error:INFO on:67:  Waiting for application shutdown.
Sep 18 07:08:54 runner-rmtk68xb-project-107-concurrent-0 2023-09-18 07:08:54,860 (MainThread) uvicorn.error:INFO on:78:  Application shutdown complete.

In summary: after the switch disconnects (while a flow stats requests was pending from reply), the of_core keeps waiting for the reply of the old request, until a point where it eventually clear the existing request and send a new one. Switches seems to discard flow stats requests that were sent before the switch reconnects to controller, so I believe it is safe to clear flow stats request after a switch reconnects.

Ccing @viniarck to also hear his opinion on that.

viniarck commented 11 months ago

@italovalcy, thanks for reporting this and your analysis.

We have a clean up when a KytosEvent ".*.connection.lost" is received, and we're safely keeping a reference to the switch. However, what probably happened was the task was scheduled in the event loop but hasn't been executed yet, and then the new handshake happened and ended up in this case. Yes, indeed, it'd be safer to actually do a clean up right before setting the connection as established, up to that point the switch isn't "is_connected()" yet (and handshake event hasn't been published yet), so it should be safe before any flow stats request is sent.

I'll push a fix for this soon. Thanks.