Functionality-wise it's been implemented, it'll keep it in draft while I'm finish the unit tests, and also I need to re-stress test when the INT lab is available again this week.
Summary
See updated changelog file
Local Tests
I explored all failover related events with dynamic EVCs, and measured a convergence with a link of the current_path going down on one EVC with iperf3, it has been on par with prior metrics compared to how mef_eline performs (a few thousands packets retransmissions at 9.5 Gbits/sec):
I also tried to explore with hundreds of EVCs, including static ones, but ended up hitting other issue #105, since that also happens to static ones, it's not related to this current change. I'll fix that on a subsequent PR.
Prior metrics that I shared on Slack
Cool to see telemetry_int in action handling ingress fast failover_path convergence on INT lab for the first time. I've explored two cases with iperf3 -c 10.22.22.3 -i 1 -t 20 -b 10G:
When handling ingress failover_path convergence it's adding roughly up to 25 ms of latency when sending the events internally, and pushing the extra INT flows, which are sent in the socket with asyncio TCP transport with the other mef_eline concurrent lower priority flows.
With 101 INT EVCs:
telemetry_int TCP packet retransmissions: 3243
Bottom line so far: The extra INT new flows didn't add much latency in the total convergence, on average, it's relatively on par with mef_eline, from the data plane network traffic point of view, and the switch has been processing them all relatively quickly too.
Data plane traffic hiccup during the failover for both mef_eline and telemetry_int:
Closes #90 Closes #33 Closes #38 Closes #105
Functionality-wise it's been implemented, it'll keep it in draft while I'm finish the unit tests, and also I need to re-stress test when the INT lab is available again this week.
Summary
See updated changelog file
Local Tests
iperf3
, it has been on par with prior metrics compared to howmef_eline
performs (a few thousands packets retransmissions at 9.5 Gbits/sec):Prior metrics that I shared on Slack
Cool to see
telemetry_int
in action handling ingress fastfailover_path
convergence on INT lab for the first time. I've explored two cases withiperf3 -c 10.22.22.3 -i 1 -t 20 -b 10G
:With one EVC:
mef_eline
TCP packet retransmissions: 1824, 2044, 4017; avg 2628.33telemetry_int
TCP packet retransmissions: 1423, 1816, 4192; avg 2477.0When handling ingress
failover_path
convergence it's adding roughly up to 25 ms of latency when sending the events internally, and pushing the extra INT flows, which are sent in the socket with asyncio TCP transport with the othermef_eline
concurrent lower priority flows.With 101 INT EVCs:
telemetry_int
TCP packet retransmissions: 3243Bottom line so far: The extra INT new flows didn't add much latency in the total convergence, on average, it's relatively on par with
mef_eline
, from the data plane network traffic point of view, and the switch has been processing them all relatively quickly too.Data plane traffic hiccup during the failover for both
mef_eline
andtelemetry_int
:Tox is passing locally but failing on Scrutinizer CI (I believe it's a temporary upstream issue, let's see):
End-to-End Tests
N/A yet