Closed viniarck closed 1 year ago
Using the same (slowed) example from issue #33 it no longer crashes and quickly recovers:
'apply_actions', 'actions': [{'port': 4294967293, 'action_type': 'output'}]}]}]}
2022-11-03 17:24:15,242 - INFO [werkzeug] [_internal.py:225:_log] (Thread-97) 127.0.0.1 - - [03/Nov/2022 17:24:15] "POST /api/kytos/topology/v3/links/4d42dc0852278accac7d9df15418f6d921d
b160b13d674029a87cef1b5f67f30/metadata HTTP/1.1" 201 -
2022-11-03 17:24:15,243 - WARNING [kytos.napps.kytos/pathfinder] [main.py:254:update_links_metadata_changed] (thread_pool_app_3) Unexpected KeyError '00:00:00:00:00:00:00:03:2' on event
kytos/topology.links.metadata.added. pathfinder will reconciliate the topology
2022-11-03 17:24:15,246 - INFO [kytos.napps.kytos/pathfinder] [main.py:243:update_topology] (thread_pool_app_0) Topology graph updated with switches: ['00:00:00:00:00:00:00:01', '00:00:
00:00:00:00:00:03', '00:00:00:00:00:00:00:02'], links: ['c8b55359990f89a5849813dc348d30e9e1f991bad1dcb7f82112bd35429d9b07', '4d42dc0852278accac7d9df15418f6d921db160b13d674029a87cef1b5f6
7f30', '78282c4d5b579265f04ebadc4405ca1b49628eb1d684bb45e5d0607fa8b713d0'].
kytos $>
Also, 100 iterations of the test case test_001_create_delete_create_with_constraints
have passed https://github.com/amlight/kytos-end-to-end-tests/issues/179, and you can see that the reconciliation did its job in two iterations, in the test itself I think we can increase the sleep a little bit, good thing the test caught this issue, in production it shouldn't happen much since it's a very specific combination (new links being created and metadata almost immediately concurrently being added)
+ for i in {1..100}
+ echo 'iter 100'
iter 100
+ python3 -m pytest tests/ --timeout=300 -k test_001_create_delete_create_with_constraints
============================= test session starts ==============================
platform linux -- Python 3.9.2, pytest-6.2.5, py-1.11.0, pluggy-1.0.0
rootdir: /tests
plugins: timeout-2.0.2
timeout: 300.0s
timeout method: signal
timeout func_only: False
collected 208 items / 207 deselected / 1 selected
tests/test_e2e_15_mef_eline.py . [100%]
-- Docs: https://docs.pytest.org/en/stable/warnings.html
------------------------------- start/stop times -------------------------------
========== 1 passed, 207 deselected, 35 warnings in 79.13s (0:01:19) ===========
+ tail -f
root@6e4aca64a884:/# grep -i "pathfinder" /var/log/messages | grep -i WARN
Nov 3 12:38:42 6e4aca64a884 kytos.napps.kytos/pathfinder:WARNING main:254: Unexpected KeyError '00:00:00:00:00:00:00:02:2' on event kytos/topology.links.metadata.added. pathfinder wi
ll reconciliate the topology
Nov 3 13:36:45 6e4aca64a884 kytos.napps.kytos/pathfinder:WARNING main:254: Unexpected KeyError '00:00:00:00:00:00:00:02:2' on event kytos/topology.links.metadata.added. pathfinder wi
ll reconciliate the topology
I'll go ahead and prioritize merging this since we have e2e tests sometimes failing with "No available path was found."
Fixes #33
This is for helping out to fix this current issue that was caught on e2e tests, it'll provide a way to try to reconciliate the topology when updating a link metadata, although this doesn't fix the root cause it'll provide a recovery reconciliation mechanism:
https://github.com/amlight/kytos-end-to-end-tests/issues/179 https://github.com/kytos-ng/topology/pull/112 (it also depends on this
topology
PR)Changelog
kytos/topology.current
on_links_metadata_changed
to also try to reconciliate the topologyLocal tests
I've simulated a
KeyError
analogous to the one that happened on the e2e tests, when it happens it reconciliated the topology in a few milliseconds assuming not much messaging overload going on:I'll keep also running e2e more executions of this test case
test_001_create_delete_create_with_constraints
, let's see if would pass 200+ iterations, if it does then it would be beneficial to ship this.