kytos-ng / kytos

Kytos SDN Platform. Kytos is designed to be easy to install, use, develop and share Network Apps (NApps).
https://kytos-ng.github.io/
MIT License
2 stars 7 forks source link

Add support for rate limiting on core queues #245

Open viniarck opened 1 year ago

viniarck commented 1 year ago

Core queues leveraging priority queues have improved handling higher priority messages accordingly, solving the prioritization problem from the controller perspective, however, switches that are managed by kytos have their own implementation on how to handle messages, so to mitigate entirely starvation on switches as well the controller can't overwhelm them by sending too many messages, that's why rate limiting is needed on core queues (this issue here) and on Flask endpoints (issue #225).

The proposed solution that will be further discussed in this thread in the future must be compatible with the current queues infrastructure that have been delivered on PR #226 and #235. Configuration options and granularity of rate limiting will also be assessed, currently we know that from NApps perspective FlowMods can generate significant spikes 200+ in a few seconds.

Ktmi commented 9 months ago

We could use this package to implement the ratelimiting, then make a subclass of KytosEventBuffer that uses the ratelimiter.

viniarck commented 9 months ago

We could use this package to implement the ratelimiting, then make a subclass of KytosEventBuffer that uses the ratelimiter.

@Ktmi, although ratelimiter has a decent interface and seems relavely well used too, it's not actively maintained since late 2022, including some PRs are stalling to fix support for Python 3.11, which as you know it's on our radar to eventually upgrade too.

Let's research another well-maintained alternative, as long as it supports both sync and asyncio and in-memory then it might be worth considering, maybe limits or something else similar. Also limits has a richer set of rate limiting algos/strategies. It seems very powerful and flexible. Another positive aspect with limits lib is that it's also a dependency of another project, slowapi that allows it to be used with starlette, so in the future we could reuse limits library as a common denominator to also have rate limits on the decorated endpoints if we end up with this dependency.

Also, when solving this problem for now we need to solve it for the msg_out KytosEventBuffer/queue, but it should be modular to be reused on other queues if needed in the future. Implementing at KytosEventBuffer.aget|get is great, we could inject the rate limiters in the constructor with a context of how to identify which "namespace/context" it belongs too when rate limiting (only the priority won't be enough), with aget it'll fit like a glove since it's already an awaited call so the IO suspention, when needed, should happen seamlessly, we'll have to avoid blocking the _queue though. In this interation, we'll need msg_out limit per dpid.

The feature itself isn't too difficult to integrate, the requirements are clear, but for sure it'll require a couple of days to properly stress test with FlowMods and make sure it's working well and stable. With that said, I'll need to keep prioritizing the other major parts on telemetry_int epic on 2023.2 to deliver the the basic functionality, I might have availability to tackle this in the next months. @Ktmi, let me know if I could count on you to research and work on this rate limit feature on msg_out queue for version 2023.2, it's priority_medium, if you think it's doable based on your current capacity, that'd be appreciated.