uber-go / ratelimit

A Go blocking leaky-bucket rate limit implementation
MIT License
4.32k stars 300 forks source link

Fix test approach for detecting issues #93

Open storozhukBM opened 2 years ago

storozhukBM commented 2 years ago

The problem in the previous approach was that mock time literally didn't run between limite.Take() calls, so now.Sub(oldState.last) would always be 0 that can be problematic for some issues detection.

This PR adds time increments between limite.Take() calls in (r *runnerImpl) startTaking. Unfortunately our library for clock mocking github.com/benbjohnson/clock has some races reported here https://github.com/benbjohnson/clock/issues/44 and here https://github.com/benbjohnson/clock/pull/42.

So I decided to use the time mocking approach that Ian Lance Taylor used 19 days ago in https://github.com/golang/time repository here https://github.com/golang/time/commit/579cf78fd858857c0d766e0d63eb2b0ccf29f436

Note: this tests approach detects an error in int64 based limiter implementation that the previous test didn't manage to catch.

codecov[bot] commented 2 years ago

Codecov Report

Merging #93 (31dcb6c) into main (b62b799) will not change coverage. Report is 8 commits behind head on main. The diff coverage is 100.00%.

@@            Coverage Diff            @@
##              main       #93   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files            4         4           
  Lines           95        97    +2     
=========================================
+ Hits            95        97    +2     
Files Coverage Δ
ratelimit.go 100.00% <100.00%> (ø)

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

rabbbit commented 2 years ago

Thanks!

This looks complicated, it'll take me some time to grok this (sorry) - I would expect maybe some time over the weekend.

It's interesting that https://github.com/benbjohnson/clock/pull/42 was fixed by @abhinav, looks like the clock bit us a few times.

Note: this tests approach detects an error in int64 based limiter implementation that the previous test didn't manage to catch.

@storozhukBM - would you like to partially revert #91? We could add your new implementation to the repo, but not make it the default (yet). This way we could track the (1) clock change (2) bug fix in the same PR eventually. I can also do this, but I thought you might want the addition to be attributed to you in git log :)

storozhukBM commented 2 years ago

would you like to partially revert https://github.com/uber-go/ratelimit/pull/91? We could add your new implementation to the repo, but not make it the default (yet). This way we could track the (1) clock change (2) bug fix in the same PR eventually. I can also do this, but I thought you might want the addition to be attributed to you in git log :)

2 Questions: Should I do the revert in a separate PR or this one? Should I fix the bug in the int64 based implementation right away, since I already have a fix?

rabbbit commented 2 years ago

(1) I'd do a separate diff. This one could be about fixing the fake time. (2) Your call really. I would slightly to push the broken version for now, and then fix it, but happy either way.

tdlr; is that I'd like to play more with the clock implementations/races (or at least understand your implementation - which I currently don't) before landing this PR.

On Mon, Jun 27, 2022 at 5:13 PM Bohdan Storozhuk @.***> wrote:

@storozhukBM https://github.com/storozhukBM - would you like to partially revert #91 https://github.com/uber-go/ratelimit/pull/91? We could add your new implementation to the repo, but not make it the default (yet). This way we could track the (1) clock change (2) bug fix in the same PR eventually. I can also do this, but I thought you might want the addition to be attributed to you in git log :)

2 Questions: Should I do the revert in a separate PR or this one? Should I fix the but in the int64 based implementation right away, since I already have a fix?

— Reply to this email directly, view it on GitHub https://github.com/uber-go/ratelimit/pull/93#issuecomment-1168066295, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACG3UWENEQ5O6BZERYC7YDVRI7TJANCNFSM5Z7XTTVA . You are receiving this because you commented.Message ID: @.***>

storozhukBM commented 2 years ago

@rabbbit

Let's do the following thing:

  1. I'll create a PR with an almost complete revert of https://github.com/uber-go/ratelimit/pull/91, except making int64-based rate limiter a default implementation.
  2. After its merge, I'll rebase this branch with a new master, so this branch will start to fail.
  3. Then, I'll make a second PR that fixes a known bug.
  4. After the fix PR gets merged, I'll rebase this branch again, and tests will show that the bug is fixed.
storozhukBM commented 2 years ago

@rabbbit First step PR https://github.com/uber-go/ratelimit/pull/94

rabbbit commented 2 years ago

Done, #94 merged. sorry for the delay. Thanks!

On Wed, Jun 29, 2022 at 1:29 PM Bohdan Storozhuk @.***> wrote:

@rabbbit https://github.com/rabbbit First step PR #94 https://github.com/uber-go/ratelimit/pull/94

— Reply to this email directly, view it on GitHub https://github.com/uber-go/ratelimit/pull/93#issuecomment-1170463641, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACG3URGCAFOGW4QFOJ636TVRSWZLANCNFSM5Z7XTTVA . You are receiving this because you were mentioned.Message ID: @.***>

storozhukBM commented 2 years ago

@rabbbit

Now we can see that new test approach detects issues with atomic_int64 implementation.

storozhukBM commented 2 years ago

@rabbbit After merging with the recent main branch, all tests pass, except TestInitial for mutex rate limiter for some reason. I'll investigate what happens with it when I have time in the next few days.

rabbbit commented 2 years ago

It's possible that mutex behaviour was broken, but we didn't detect it because of the broken clock :|

On Wed, Jul 13, 2022 at 4:15 PM Bohdan Storozhuk @.***> wrote:

@rabbbit https://github.com/rabbbit After merging with the recent main branch, all tests pass, except TestInitial for mutex rate limiter for some reason. I'll investigate what happens with it when I have time in the next few days.

— Reply to this email directly, view it on GitHub https://github.com/uber-go/ratelimit/pull/93#issuecomment-1183771468, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACG3UQFR7DHSNS6T3LYLADVT5EXLANCNFSM5Z7XTTVA . You are receiving this because you were mentioned.Message ID: @.***>

storozhukBM commented 2 years ago

@rabbbit

It's possible that mutex behaviour was broken, but we didn't detect it because of the broken clock :|

No, the use of advanceToTimer approach removes the issue. I think just advance(time.Duration) is not sufficient with this testTime implementation. I'll figure it out.

storozhukBM commented 2 years ago

@rabbbit I think I addressed all your comments, so please take another look

storozhukBM commented 2 years ago

I've been wondering about some simplified fake clock library - given the amount of issues we've seen it seems it must be possible to make it simpler -__-

If you take a look at the libraries we have used in the past, like https://github.com/benbjohnson/clock/ it is MUCH more complex, has concurrency bugs, and `time. Sleep's as well here

I researched every library that I was able to find on Github, and this approach from golang/time is the most straightforward I've seen.

One more consideration that we should take into account is that we can find a lib or implementation that will pass all our tests but is going to detect the real bug. After we change the time library, we should use mutation testing at least manually introduce some bugs into the existing rate limiter implementations that we have and make sure that our tests with the new clock library detect the issue.

storozhukBM commented 2 years ago

Hey @rabbbit, what do you think? Should we move forward with this PR?

rabbbit commented 1 year ago

We spoke about this briefly in #101 and #90 - I'm not sure about merging this:

For now I'll put this as "on-hold". This means that the library is effectively in code-freeze, as our test clock implementation is buggy.

storozhukBM commented 1 year ago

@rabbbit can we find someone form Uber to make a depth review?

storozhukBM commented 4 months ago

We maybe can get something in standard library for this https://github.com/golang/go/issues/67434