[Feature] async OF13 handlers and messages prioritization

Fixes #37, https://github.com/kytos-ng/kytos/issues/224, and https://github.com/kytos-ng/kytos/issues/172

Release notes

of_core handles and assembles OpenFlow 1.3 messages with async methods
on_multipart_reply is now handled with an async method in line
Updated kytos/of_core.flow_stats.received to also include the replied flows
KytosEvent put in msg_in and msg_out now have priority based on their control plane importance to avoid starvation
Replaced kytos.core log instance with a new one for performance reasons for now

This PR implement the changes that on https://github.com/kytos-ng/kytos/issues/172 solution a):

cores_queues_alisten_to

Results

The 95 percentile and the maximum latency is at least twice as faster, being faster wasn't the primary goal here, but since asyncio.Task is lighter weight and explicit context switching contributed to less time spend waiting for locks then it contributed to these gains.

async_95

master_95

The original scenario reported on https://github.com/kytos-ng/kytos/issues/224 was run again, and now with twice a longer duration it didn't terminated any connections that was reproducing very frequently, the echo reply messages were sent in a timely manner, so overall, it improved:

2022-06-21-153211_1275x261_scrot

❯ jq -ncM '{method: "POST", url: "http://localhost:8181/api/kytos/flow_manager/v2/flow_mods/00:00:00:00:00:00:00:01", body: { "force": false, "flows": [ { "priority": 10, "match": { "in_
port": 1, "dl_vlan": 100 }, "actions": [ { "action_type": "output", "port": 1 } ] } ] } | @base64, header: {"Content-Type": ["application/json"]}}' | vegeta attack -format=json -rate 250
/1s -duration=20s | tee results.bin | vegeta report
Requests      [total, rate, throughput]         5000, 250.05, 241.91
Duration      [total, attack, wait]             20.668s, 19.996s, 672.583ms
Latencies     [min, mean, 50, 90, 95, 99, max]  1.937ms, 325.66ms, 267.111ms, 650.822ms, 832.028ms, 1.462s, 1.679s
Bytes In      [total, mean]                     15000, 3.00
Bytes Out     [total, mean]                     615000, 123.00
Success       [ratio]                           100.00%
Status Codes  [code:count]                      202:5000
Error Set:

However, notice that for high rates of API requests https://github.com/kytos-ng/kytos/issues/225 it can lead to instability, which can also impact on the queues so even with the priority it can suffer a bit if users go over the requests limit to that point, so this part hasn't been fixed but as soon as we have Flask 2.0 and make more use of async request handlers it should overall improve with that part too (gevent is also an option, but asyncio is more aligned with the platform architecture/goals and better supported upstream, so when the time comes again this will be explored one more time), also adding some rate limiting with Flask should also help a bit to avoid getting to that point.

@viniarck thank you very much for submitting this PR. It looks good to me.

I've executed the tests mentioned in issue kytos-ng/kytos#224 using a lab with 5 Noviflow switches, and I was able to reproduce the same issue observed before. At that point, I used the hello interval of 15s in the Noviflow switches and executed the same stress test as Vinicius, except that I used 150 req/sec for 60 seconds. Without this PR, the switches started disconnecting a lot during the tests. Even after the stress test finished, the switches still kept disconnecting. Once I applied the changes from this PR, only 4 disconnections were observed (probably due to the problem Vinicius mentioned in the Flask REST endpoints processing overhead).

I've also executed the end-to-end tests using this branch, and all tests passed without surprises: 166 passed, 19 xfailed, 5 xpassed, 560 warnings in 9988.43s (2:46:28)

Finally, I tested the creation of 800 flows on each one of the five switches and left the consistency check routine run for 2 hours without surprises (no false positives - invalid inconsistencies triggered - and no false negatives - alien and missing flows were correctly detected). I will keep the test running for a longer period, and if something happens, it will be reported as an issue.

@italovalcy much appreciated, great to hear that it did well under these circumstances. Yes, as the core move towards more asynchronous primitives it should improve, let's see how that will evolve with Flask 2.0 and re-run these tests in the future too. Sure thing, looking forward to hearing more about this long running results, it's great to also have them.

kytos-ng / of_core

[Feature] async OF13 handlers and messages prioritization #70

Release notes

Results