kytos-ng / of_core

Kytos Main OpenFlow Network Application (NApp)
MIT License
0 stars 5 forks source link

[Feature] async OF13 handlers and messages prioritization #70

Closed viniarck closed 2 years ago

viniarck commented 2 years ago

Fixes #37, https://github.com/kytos-ng/kytos/issues/224, and https://github.com/kytos-ng/kytos/issues/172

Release notes

This PR implement the changes that on https://github.com/kytos-ng/kytos/issues/172 solution a):

cores_queues_alisten_to

Results

async_95

master_95

2022-06-21-153211_1275x261_scrot

❯ jq -ncM '{method: "POST", url: "http://localhost:8181/api/kytos/flow_manager/v2/flow_mods/00:00:00:00:00:00:00:01", body: { "force": false, "flows": [ { "priority": 10, "match": { "in_
port": 1, "dl_vlan": 100 }, "actions": [ { "action_type": "output", "port": 1 } ] } ] } | @base64, header: {"Content-Type": ["application/json"]}}' | vegeta attack -format=json -rate 250
/1s -duration=20s | tee results.bin | vegeta report
Requests      [total, rate, throughput]         5000, 250.05, 241.91
Duration      [total, attack, wait]             20.668s, 19.996s, 672.583ms
Latencies     [min, mean, 50, 90, 95, 99, max]  1.937ms, 325.66ms, 267.111ms, 650.822ms, 832.028ms, 1.462s, 1.679s
Bytes In      [total, mean]                     15000, 3.00
Bytes Out     [total, mean]                     615000, 123.00
Success       [ratio]                           100.00%
Status Codes  [code:count]                      202:5000
Error Set:

However, notice that for high rates of API requests https://github.com/kytos-ng/kytos/issues/225 it can lead to instability, which can also impact on the queues so even with the priority it can suffer a bit if users go over the requests limit to that point, so this part hasn't been fixed but as soon as we have Flask 2.0 and make more use of async request handlers it should overall improve with that part too (gevent is also an option, but asyncio is more aligned with the platform architecture/goals and better supported upstream, so when the time comes again this will be explored one more time), also adding some rate limiting with Flask should also help a bit to avoid getting to that point.

viniarck commented 2 years ago

@viniarck thank you very much for submitting this PR. It looks good to me.

I've executed the tests mentioned in issue kytos-ng/kytos#224 using a lab with 5 Noviflow switches, and I was able to reproduce the same issue observed before. At that point, I used the hello interval of 15s in the Noviflow switches and executed the same stress test as Vinicius, except that I used 150 req/sec for 60 seconds. Without this PR, the switches started disconnecting a lot during the tests. Even after the stress test finished, the switches still kept disconnecting. Once I applied the changes from this PR, only 4 disconnections were observed (probably due to the problem Vinicius mentioned in the Flask REST endpoints processing overhead).

I've also executed the end-to-end tests using this branch, and all tests passed without surprises: 166 passed, 19 xfailed, 5 xpassed, 560 warnings in 9988.43s (2:46:28)

Finally, I tested the creation of 800 flows on each one of the five switches and left the consistency check routine run for 2 hours without surprises (no false positives - invalid inconsistencies triggered - and no false negatives - alien and missing flows were correctly detected). I will keep the test running for a longer period, and if something happens, it will be reported as an issue.

@italovalcy much appreciated, great to hear that it did well under these circumstances. Yes, as the core move towards more asynchronous primitives it should improve, let's see how that will evolve with Flask 2.0 and re-run these tests in the future too. Sure thing, looking forward to hearing more about this long running results, it's great to also have them.

viniarck commented 2 years ago

I'll go ahead and merge this PR soon then.