tumblr / k8s-sidecar-injector

Kubernetes sidecar injection service
Apache License 2.0
345 stars 75 forks source link

pkg/coalescer/coalescer_test.go: Fix data race #18

Closed while1malloc0 closed 5 years ago

while1malloc0 commented 5 years ago

This commit fixes a data race that occurs in the coalescer test when the actualEvents variable is read from in the test assertion before the last event has been written to it.

What and why?

The file pkg/coalescer/coalescer_test.go contains a data race that can occasionally fail a test run. This can be verified with go test -race ./pkg/coalescer.

Reviewers

Required reviewers: @byxorna

yahoocla commented 5 years ago

Thank you for submitting this pull request, however I do not see a valid CLA on file for you. Before we can merge this request please visit https://yahoocla.herokuapp.com/ and agree to the terms. Thanks! :smile:

byxorna commented 5 years ago

@while1malloc0 thanks for the PR! Great catch - mind signing the CLA the bot posted above? Once that is complete, I can merge this.

Out of curiosity, did you run into this due to a flaky run of the unit tests?

while1malloc0 commented 5 years ago

@byxorna Thanks for the awesome project!

I was unable to reproduce a failing test case for the data race on my laptop, but it showed up a few times in CI. The test would fail because actualEvents was one less than it should have been, which seems to point to this data race.

This was found as part of some work I was doing on a fork of this project for my employer. We needed the functionality described in https://github.com/tumblr/k8s-sidecar-injector/issues/17. As such, I'll have to get sign off on that CLA. I'll update here once I have that.

while1malloc0 commented 5 years ago

~I just signed the CLA. Do I need to push to kick off another build?~ edit: it looks like the build updated automatically.

Also, I'm starting to doubt my solution here, and maybe a reviewer can sanity check me. I think that there still might be a data race between the read here: https://github.com/tumblr/k8s-sidecar-injector/pull/18/files#diff-94510f2b305f33933564c2e9ae1e2af7R71 and the read from the stop channel here https://github.com/tumblr/k8s-sidecar-injector/pull/18/files#diff-94510f2b305f33933564c2e9ae1e2af7R54. The race detector seems happy about this solution, but it feels like this solution relies on the read from select occurring before the two instructions between stopCh <- struct{}{} finish.

Maybe a better approach would be to use a separate sync.WaitGroup (or reuse the existing one) in the reading goroutine. I think that would require Coalesce to close the output channel though, since it's read-only to the caller. It seems like it's safe to call close(output) at this line https://github.com/tumblr/k8s-sidecar-injector/blob/master/pkg/coalescer/coalescer.go#L48, since we've sent the final event.

while1malloc0 commented 5 years ago

You can use the existing waitgroup to make this code race free, as well. wg.Add(1) at L46, and then add a wg.Done() at L58 (and remove your stop channel). That way we will block wg.Wait() until all goroutines have indicated they are done.

I really like the idea of that approach so that there's less extra code needed, but when I add that the test never finishes. It seems like output == nil is never true.

byxorna commented 5 years ago

You can use the existing waitgroup to make this code race free, as well. wg.Add(1) at L46, and then add a wg.Done() at L58 (and remove your stop channel). That way we will block wg.Wait() until all goroutines have indicated they are done.

I really like the idea of that approach so that there's less extra code needed, but when I add that the test never finishes. It seems like output == nil is never true.

Yea, great point. I think something like this will cover us:

for {
  select {
    case v, ok:=<-output:
      if !ok {
        break
      }
      actualEvents++
    }
}
while1malloc0 commented 5 years ago

Won't that have the same problem? Since output is never being closed, that ok won't ever fire and we'll block forever reading on the channel?

byxorna commented 5 years ago

Won't that have the same problem? Since output is never being closed, that ok won't ever fire and we'll block forever reading on the channel?

Ah yes! So, now that i reread it, we don't actually need to make sure the goroutine that consumes output needs to be closed - our expectation is that we await the input routine, wait for the debounce period, and then assert the number of things coming out of the event coalescer is what we expect. To clean up, adding a close(output) after time.Sleep(debounceDuration) should be a way to ensure that goroutine is terminated, but we dont need to imo. Thoughts?

byxorna commented 5 years ago

heya @while1malloc0, just bumping this thread. Any thoughts on my comment above?

while1malloc0 commented 5 years ago

Hey @byxorna, sorry for letting this fall of my radar for a bit.

To clean up, adding close(output) after time.Sleep...

I don't think you can close output from the test, since it's a receive-only channel. Since the line stop <-struct{}{} is synchronous, I think this solution should work unless you think there's a better way to do this.

byxorna commented 5 years ago

@while1malloc0 Ok, that sounds good. I have no objections to how this is solved right now. Are you happy with this solution? If so, we can loop in a few more sets of eyes and get this merged :)

while1malloc0 commented 5 years ago

Yeah, I'm happy with this. Feel free to merge at your convenience.

byxorna commented 5 years ago

@while1malloc0 thanks! merging now - the build tagged :latest will be out in ~20min