test: Add test for global rate limiting with load balancing

philipgough commented 11 months ago

Draft PR to replicate the behaviour/bug seen in #208

thrawn01 commented 9 months ago

What are we testing here? GPRC's load balancing or the ability for gubernator to distribute the hits to the rest of the cluster?

In reference to unit testing vs functional. If we can have a unit test which tests the ahem functionality without violating public interfaces, then we can do so. Any test which violates the public interface is suspect as it limits your ability to refactor in the future. Functional Tests > Unit Tests.

You can read my thoughts on this here https://wippler.dev/posts/Testing-private-methods

thrawn01 commented 9 months ago

ah, I didn't realize this PR was related to #208 👍

philipgough commented 9 months ago

Yep @thrawn01 - I just created this to reproduce the exact issue I reported in #208 for clarity!

miparnisari commented 9 months ago

@thrawn01

What are we testing here? GPRC's load balancing or the ability for gubernator to distribute the hits to the rest of the cluster?

I didn't write the test but the goal should be to assert that the gossiping of the rate limiting from owner to peers is correct in that the status field is correct.

Functional Tests > Unit Tests.

I don't know what you meant by this (and i can't access the URL you provided on my work laptop) but if a bug happens in a confined place (which is the case here), you should not have to write a functional test to demonstrate it. The test should be laser-focused on what is broken. Functional tests cover a lot more code.

That being said, I don't know the codebase enough to say whether a unit test makes sense here.

thrawn01 commented 9 months ago

@miparnisari re functional testing.

I understand that most developers are taught to write unit tests and thus are surprised when they encounter a project which is mostly functional tests. This is a common issue when we on boarding new developers at Mailgun. Please have confidence when making changes to algorithms.go that those functions are well tested. A coverage run shows 81% of statements in algorithms.go are tested with 1,195 calls during the functional test suite run. I'm not claiming the current testing level is adequate, (we could improve coverage on Behavior_DURATION_IS_GREGORIAN for instance) only that functional testing allows developers to freely change the internals of a system while having high confidence they didn't break functionality.

Baliedge commented 8 months ago

@philipgough Thank you very much for your contribution. I was able to reproduce the error using your test. And since the PR has a merge conflict and it's been some time since created, in a separate branch I updated from master and retried the test to see if recent changes from https://github.com/mailgun/gubernator/pull/219 had addressed your issue.

I found some improvement. Sometimes the test passes:

=== RUN   TestGlobalRateLimitsWithLoadBalancing
--- PASS: TestGlobalRateLimitsWithLoadBalancing (5.00s)
PASS
ok      github.com/mailgun/gubernator/v2        6.074s

Other times I get 2 failed assertions:

=== RUN   TestGlobalRateLimitsWithLoadBalancing
    functional_test.go:1060:
                Error Trace:    /Users/spoulson/src/gubernator2/functional_test.go:1060
                                                        /Users/spoulson/src/gubernator2/functional_test.go:1070
                Error:          Not equal:
                                expected: 0
                                actual  : 1
                Test:           TestGlobalRateLimitsWithLoadBalancing
                Messages:       1
    functional_test.go:1060:
                Error Trace:    /Users/spoulson/src/gubernator2/functional_test.go:1060
                                                        /Users/spoulson/src/gubernator2/functional_test.go:1071
                Error:          Not equal:
                                expected: 0
                                actual  : 1
                Test:           TestGlobalRateLimitsWithLoadBalancing
                Messages:       2
--- FAIL: TestGlobalRateLimitsWithLoadBalancing (5.01s)

But this is better than the original code that failed 5 assertions. We're on the right path.

You can refer to my branch to cherry-pick my commits if needed: https://github.com/mailgun/gubernator/compare/master...Baliedge/global-lb

Baliedge commented 8 months ago

Closed in favor of https://github.com/mailgun/gubernator/pull/224. I applied the edits I mentioned above and some other adjustments. Test passes every time.

mailgun / gubernator

test: Add test for global rate limiting with load balancing #207