kytos-ng / pathfinder

Kytos main path finder Network Application (NApp)
https://kytos-ng.github.io/api/pathfinder.html
MIT License
0 stars 7 forks source link

[Fix] Subscribed to kytos/topology.current to try to reconciliate the topology #32

Closed viniarck closed 1 year ago

viniarck commented 1 year ago

Fixes #33

This is for helping out to fix this current issue that was caught on e2e tests, it'll provide a way to try to reconciliate the topology when updating a link metadata, although this doesn't fix the root cause it'll provide a recovery reconciliation mechanism:

https://github.com/amlight/kytos-end-to-end-tests/issues/179 https://github.com/kytos-ng/topology/pull/112 (it also depends on this topology PR)

Changelog

Local tests

I've simulated a KeyError analogous to the one that happened on the e2e tests, when it happens it reconciliated the topology in a few milliseconds assuming not much messaging overload going on:

2022-11-02 18:15:08,708 - INFO [werkzeug] [_internal.py:225:_log] (Thread-109) 127.0.0.1 - - [02/Nov/2022 18:15:08] "POST /api/kytos/topology/v3/links/4d42dc0852278accac7d9df15418f6d921
db160b13d674029a87cef1b5f67f30/metadata HTTP/1.1" 201 -
2022-11-02 18:15:08,709 - WARNING [kytos.napps.kytos/pathfinder] [main.py:255:update_links_metadata_changed] (thread_pool_app_6) Unexpected KeyError 'what' on event kytos/topology.links
.metadata.added. pathfinder will reconciliate the topology
2022-11-02 18:15:08,713 - INFO [kytos.napps.kytos/pathfinder] [main.py:243:update_topology] (thread_pool_app_2) Topology graph updated with switches: ['00:00:00:00:00:00:00:01', '00:00:
00:00:00:00:00:02', '00:00:00:00:00:00:00:03'], links: ['4d42dc0852278accac7d9df15418f6d921db160b13d674029a87cef1b5f67f30', '78282c4d5b579265f04ebadc4405ca1b49628eb1d684bb45e5d0607fa8b7
13d0', 'c8b55359990f89a5849813dc348d30e9e1f991bad1dcb7f82112bd35429d9b07'].

I'll keep also running e2e more executions of this test case test_001_create_delete_create_with_constraints, let's see if would pass 200+ iterations, if it does then it would be beneficial to ship this.

viniarck commented 1 year ago

Using the same (slowed) example from issue #33 it no longer crashes and quickly recovers:

'apply_actions', 'actions': [{'port': 4294967293, 'action_type': 'output'}]}]}]}
2022-11-03 17:24:15,242 - INFO [werkzeug] [_internal.py:225:_log] (Thread-97) 127.0.0.1 - - [03/Nov/2022 17:24:15] "POST /api/kytos/topology/v3/links/4d42dc0852278accac7d9df15418f6d921d
b160b13d674029a87cef1b5f67f30/metadata HTTP/1.1" 201 -
2022-11-03 17:24:15,243 - WARNING [kytos.napps.kytos/pathfinder] [main.py:254:update_links_metadata_changed] (thread_pool_app_3) Unexpected KeyError '00:00:00:00:00:00:00:03:2' on event
 kytos/topology.links.metadata.added. pathfinder will reconciliate the topology
2022-11-03 17:24:15,246 - INFO [kytos.napps.kytos/pathfinder] [main.py:243:update_topology] (thread_pool_app_0) Topology graph updated with switches: ['00:00:00:00:00:00:00:01', '00:00:
00:00:00:00:00:03', '00:00:00:00:00:00:00:02'], links: ['c8b55359990f89a5849813dc348d30e9e1f991bad1dcb7f82112bd35429d9b07', '4d42dc0852278accac7d9df15418f6d921db160b13d674029a87cef1b5f6
7f30', '78282c4d5b579265f04ebadc4405ca1b49628eb1d684bb45e5d0607fa8b713d0'].
kytos $>

Also, 100 iterations of the test case test_001_create_delete_create_with_constraints have passed https://github.com/amlight/kytos-end-to-end-tests/issues/179, and you can see that the reconciliation did its job in two iterations, in the test itself I think we can increase the sleep a little bit, good thing the test caught this issue, in production it shouldn't happen much since it's a very specific combination (new links being created and metadata almost immediately concurrently being added)

+ for i in {1..100}
+ echo 'iter 100'
iter 100
+ python3 -m pytest tests/ --timeout=300 -k test_001_create_delete_create_with_constraints
============================= test session starts ==============================
platform linux -- Python 3.9.2, pytest-6.2.5, py-1.11.0, pluggy-1.0.0
rootdir: /tests
plugins: timeout-2.0.2
timeout: 300.0s
timeout method: signal
timeout func_only: False
collected 208 items / 207 deselected / 1 selected

tests/test_e2e_15_mef_eline.py .                                         [100%]

-- Docs: https://docs.pytest.org/en/stable/warnings.html
------------------------------- start/stop times -------------------------------
========== 1 passed, 207 deselected, 35 warnings in 79.13s (0:01:19) ===========
+ tail -f

root@6e4aca64a884:/# grep -i "pathfinder" /var/log/messages | grep -i WARN
Nov  3 12:38:42 6e4aca64a884 kytos.napps.kytos/pathfinder:WARNING main:254:  Unexpected KeyError '00:00:00:00:00:00:00:02:2' on event kytos/topology.links.metadata.added. pathfinder wi
ll reconciliate the topology
Nov  3 13:36:45 6e4aca64a884 kytos.napps.kytos/pathfinder:WARNING main:254:  Unexpected KeyError '00:00:00:00:00:00:00:02:2' on event kytos/topology.links.metadata.added. pathfinder wi
ll reconciliate the topology
viniarck commented 1 year ago

I'll go ahead and prioritize merging this since we have e2e tests sometimes failing with "No available path was found."