kytos-ng / telemetry_int

Kytos Telemetry Napp
MIT License
0 stars 2 forks source link

feat: subscribe and handle `kytos/mef_eline.(failover_link_down|failover_old_path|failover_deployed)` #104

Closed viniarck closed 3 months ago

viniarck commented 3 months ago

Closes #90 Closes #33 Closes #38 Closes #105

Functionality-wise it's been implemented, it'll keep it in draft while I'm finish the unit tests, and also I need to re-stress test when the INT lab is available again this week.

Summary

See updated changelog file

Local Tests

[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-60.00  sec  66.0 GBytes  9.44 Gbits/sec  1085             sender
[  4]   0.00-60.00  sec  65.9 GBytes  9.44 Gbits/sec                  receiver
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-60.00  sec  66.0 GBytes  9.45 Gbits/sec  2914             sender
[  4]   0.00-60.00  sec  66.0 GBytes  9.44 Gbits/sec                  receiver

Prior metrics that I shared on Slack

Cool to see telemetry_int in action handling ingress fast failover_path convergence on INT lab for the first time. I've explored two cases with iperf3 -c 10.22.22.3 -i 1 -t 20 -b 10G:

With one EVC:

When handling ingress failover_path convergence it's adding roughly up to 25 ms of latency when sending the events internally, and pushing the extra INT flows, which are sent in the socket with asyncio TCP transport with the other mef_eline concurrent lower priority flows.

With 101 INT EVCs:

Bottom line so far: The extra INT new flows didn't add much latency in the total convergence, on average, it's relatively on par with mef_eline, from the data plane network traffic point of view, and the switch has been processing them all relatively quickly too.

Data plane traffic hiccup during the failover for both mef_eline and telemetry_int:

[  4]   4.00-5.00   sec  1.11 GBytes  9.56 Gbits/sec    0   2.82 MBytes       
[  4]   5.00-6.00   sec   411 MBytes  3.45 Gbits/sec  2044   1.41 MBytes       
[  4]   6.00-7.00   sec  1.11 GBytes  9.56 Gbits/sec    0   1.41 MBytes       

[  4]   5.00-6.00   sec  1.11 GBytes  9.56 Gbits/sec    0   2.51 MBytes       
[  4]   6.00-7.00   sec  1.01 GBytes  8.71 Gbits/sec    0   2.51 MBytes       
[  4]   7.00-8.00   sec   509 MBytes  4.27 Gbits/sec  1816   1.26 MBytes       
[  4]   8.00-9.00   sec  1.11 GBytes  9.56 Gbits/sec    0   1.26 MBytes       

Tox is passing locally but failing on Scrutinizer CI (I believe it's a temporary upstream issue, let's see):

---------- coverage: platform linux, python 3.11.9-final-0 -----------
Name                                        Stmts   Miss  Cover
---------------------------------------------------------------
__init__.py                                     0      0   100%
exceptions.py                                  31      2    94%
kytos_api_helper.py                            76     15    80%
main.py                                       272     88    68%
managers/__init__.py                            0      0   100%
managers/flow_builder.py                      159      2    99%
managers/int.py                               353     61    83%
proxy_port.py                                  24      3    88%
settings.py                                    12      0   100%
tests/conftest.py                              18      0   100%
tests/unit/test_flow_builder_failover.py      152      0   100%
tests/unit/test_flow_builder_inter_evc.py      60      0   100%
tests/unit/test_flow_builder_intra_evc.py     152      0   100%
tests/unit/test_int_manager.py                366      0   100%
tests/unit/test_kytos_api_helper.py            63      0   100%
tests/unit/test_main.py                       253      0   100%
tests/unit/test_utils.py                       79      0   100%
utils.py                                       67      0   100%
---------------------------------------------------------------
TOTAL                                        2137    171    92%

============================================================================ 84 passed, 114 warnings in 7.34s ============================================================================
lint: recreate env because env type changed from {'name': 'coverage', 'type': 'VirtualEnvRunner'} to {'name': 'lint', 'type': 'VirtualEnvRunner'}
lint: remove tox env folder /home/viniarck/repos/telemetry_int/.tox/py311
coverage: OK ✔ in 46.29 seconds
lint: install_deps> python -I -m pip install -r requirements/dev.in
lint: commands[0]> python3 setup.py lint
running lint
Yala is running. It may take several seconds...
INFO: Finished isort
INFO: Finished black
INFO: Finished pycodestyle
INFO: Finished pylint
:) No issues found.
[isort] Skipped 3 files
  coverage: OK (46.29=setup[38.50]+cmd[7.79] seconds)
  lint: OK (43.95=setup[38.12]+cmd[5.83] seconds)
  congratulations :) (90.27 seconds)

End-to-End Tests

N/A yet