python-zk / kazoo

Kazoo is a high-level Python library that makes it easier to use Apache Zookeeper.
https://kazoo.readthedocs.io
Apache License 2.0
1.3k stars 386 forks source link

feat(core) implement client request rate limiting #683

Open ceache opened 1 year ago

ceache commented 1 year ago

Fixes #664,

Why is this needed?

This provides a top level control to limit the load a given client can generate on an ensemble in the form of number of requests. In situation with large number of watches, the number of request emitted by the client is otherwise unbounded.

Proposed Changes

Does this PR introduce any breaking change?

No, it is backward compatible with a default of no limit.

ceache commented 1 year ago

@python-zk/maintainers This is a first draft. Let me know if you like the direction and I'll add tests, etc.

Full disclosure, we have had a version of this patch in production for a while (with both gevent and threading handlers). I have not tested the eventlet handler directly.

codecov-commenter commented 1 year ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Comparison is base (55f27b2) 96.76% compared to head (df7c611) 96.75%. Report is 4 commits behind head on master.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #683 +/- ## ========================================== - Coverage 96.76% 96.75% -0.02% ========================================== Files 27 27 Lines 3557 3570 +13 ========================================== + Hits 3442 3454 +12 - Misses 115 116 +1 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

StephenSorriaux commented 1 year ago

I think it is a great idea. Just wondering: from a user experience perspective, will there be any way to know the rate limit has been hit?

ceache commented 1 year ago

No, at least not in this current form.

Sync requests will block, as if the zk server was slow.

This was designed for async requests so that code, possibly in different threads, could simply queue them up knowing that the client would respect the set rate limit. Think of chains of get_children_async | get_async when walking a hierarchy.

We could add logging, it would not be difficult, but I would not want it to be intrusive (i.e. debug?)

I feel like throttling for rate limit is not an "issue", it is a feature.

Does thay make sense?

On Sun, Nov 13, 2022, 12:53 Stephen Sorriaux @.***> wrote:

I think it is a great idea. Just wondering: from a user experience perspective, will there be any way to know the rate limit has been hit?

— Reply to this email directly, view it on GitHub https://github.com/python-zk/kazoo/pull/683#issuecomment-1312785656, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIFTHQZ6D6LMQI7SJEFGBLWIETI5ANCNFSM6AAAAAAR6Q7REY . You are receiving this because you authored the thread.Message ID: @.***>

StephenSorriaux commented 1 year ago

Yes, that makes a lot of sense, thank you.

I agree with you, rate limit is a feature, but sometimes it can be hard to "understand"/be aware of it if there is no logs telling you "hey, rate limit is currently being triggered" or "hey, rate limit has been triggered N times in the past M seconds" (even if, I agree, there is already a info log at client startup). You're right this should not be intrusive, I know some projects display a "rate limit has been triggered" message once in a while (every 30s or so for instance), but I believe a debug message can work too if you think it is better/useful.

Writing this message, it makes me realize that this lib currently does not provide any metrics (if it would, a number of times rate limit has been triggered would be useful). Maybe it is something we could add to the backlog, but that is not the point of your current proposition that, again, I think is great.

ceache commented 1 year ago

I'll add a log message for now, and move to add some tests.

About metrics, i was talking to opentelemetry folks at KubeCon about exactly that. it looks like it would be possible to use such a framework. I mean, they instrument SQLite3 and requests modules for example

IMHO, of we can pull this off, this would be an ideal way to extract this kind of information, as well as request counts, error counts, connection length and a million other things i have always wanted to know about my zookeeper client but never knew how to ask 😜

On a more serious note, if you like the idea, i can create an issue for this and try to gather more Intel?

On Tue, Nov 15, 2022, 18:41 Stephen Sorriaux @.***> wrote:

Yes, that makes a lot of sense, thank you.

I agree with you, rate limit is a feature, but sometimes it can be hard to "understand"/be aware of it if there is no logs telling you "hey, rate limit is currently being triggered" or "hey, rate limit has been triggered N times in the past M seconds" (even if, I agree, there is already a info log at client startup). You're right this should not be intrusive, I know some projects display a "rate limit has been triggered" message once in a while (every 30s or so for instance), but I believe a debug message can work too if you think it is better/useful.

Writing this message, it makes me realize that this lib currently does not provide any metrics (if it would, a number of times rate limit has been triggered would be useful). Maybe it is something we could add to the backlog, but that is not the point of your current proposition that, again, I think is great.

— Reply to this email directly, view it on GitHub https://github.com/python-zk/kazoo/pull/683#issuecomment-1316031405, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIFTHTGT7CYMUWYAPK2NWLWIQNS3ANCNFSM6AAAAAAR6Q7REY . You are receiving this because you authored the thread.Message ID: @.***>

StephenSorriaux commented 1 year ago

I'll add a log message for now, and move to add some tests. About metrics, i was talking to opentelemetry folks at KubeCon about exactly that. it looks like it would be possible to use such a framework. I mean, they instrument SQLite3 and requests modules for example IMHO, of we can pull this off, this would be an ideal way to extract this kind of information, as well as request counts, error counts, connection length and a million other things i have always wanted to know about my zookeeper client but never knew how to ask stuck_out_tongue_winking_eye On a more serious note, if you like the idea, i can create an issue for this and try to gather more Intel?

Sorry for my late reply @ceache, busy weeks :(

To be honest, I really like the idea and it would be great if you can get some intel about it, especially on how we should make those metrics available to our users: should we provide some interface so that they can plug whatever client they want? Should we actually provide some integration ourselves (like prometheus, etc.)? I totally feel you on this subject, I also love to get tons of metrics about what's in production!

Buffer0x7cd commented 1 year ago

Hey Folks, any update on the progress of this PR ? Let me know if there is any help needed to complete this feature.

ceache commented 7 months ago

@python-zk/maintainers I think I have implemented all what we discussed. All the more "future looking" ideas discussed here (prometheus, OTL,) would be addressed in future PRs