Proposal: new decorador FlakeBackoff for more fine-grained control over flaked tests

vgarvardt commented 7 months ago

Ginkgo supports FlakeAttempts decorator that helps a lot with the flaky tests. But some tests are "flaky" by design, e.g. async communication e2e tests - test publishes something into message broker and then requests for the status update in one of the services that should consume the published event. For those kinds of tests flakiness is expected and it makes sense to give tested services some time to perform an action. Right now the test for this kind of scenario looks like this:

import(
  "context"

  g "github.com/onsi/ginkgo/v2"
  om "github.com/onsi/gomega"
  "github.com/vgarvardt/backoff"
)

var _ = g.Describe("Test some async functionality", g.Ordered, func() {
  g.BeforeAll(func() {
    // setup kafka connection
  })

  g.AfterAll(func(ctx context.Context) {
    // tear down kafka connection
  })

  g.It("Trigger some action by publishing an event", func(ctx context.Context) {
    err := kafka.Publish(ctx, ...)
    om.Expect(err).ShouldNot(om.HaveOccurred())
  })

  var attemptEnsureStatus int
  g.It("Ensure that published event did its job", g.FlakeAttempts(5), func(ctx context.Context) {
    time.Sleep(backoff.DefaultExponential.Backoff(attemptEnsureStatus))
    attemptEnsureStatus++

    res, err := httpClient.Do(ctx, ...)
    om.Expect(err).ShouldNot(om.HaveOccurred())
    om.Expect(res.Status).To(om.Equal("desired-status"))
  })
})

A node that checks the result of the async functionality gives asserts some time before checking the result because real-time is not expected here. And it also waits between attempts for the same reason.

My proposal is to add new decorador FlakeBackoff that would extend the behavior of the node runner when FlakeAttempts decorator is set. It may receive an interface similar to the one I'm using that is simply an extraction from grpc-go internal backoff.

With the decorator the flaky node above will look like:

  g.It("Ensure that published event did its job", g.FlakeAttempts(5), g.FlakeBackoff(backoff.DefaultExponential), func(ctx context.Context) {
    res, err := httpClient.Do(ctx, ...)
    om.Expect(err).ShouldNot(om.HaveOccurred())
    om.Expect(res.Status).To(om.Equal("desired-status"))
  })

Implementation-wise I can contribute the implementation if the proposal is accepted.

onsi commented 7 months ago

hey @vgarvardt Gomega provides first class support for this via Eventually. In fact, I would not recommend using FlakeAttempts for this usecase. FlakeAttempts is intended to signify "this test should pass all the time but something unknown/non-deterministic (e.g. resource contention, bugs) is causing it to fail periodically."

Eventually is intended to convey normal asynchronous distributed system behavior and it is designed to allow you to give the system time to reach the desired state. Here's how you use it:

If("does the job after being published", func(ctx context.Context) {
    Expect(kafka.Publish(ctx, ...)).To(Succeed())
    Eventually(httpClient.Do).WithContext(ctx).WithArguments(...).Should(HaveStatus(http.StatusOK))
})

onsi commented 7 months ago

(The docs cover how you can specify different timeouts and retry intervals)

vgarvardt commented 7 months ago

@onsi Eventually looks exactly what I need. Missed it in the docs. Thank you!

onsi / ginkgo

Proposal: new decorador FlakeBackoff for more fine-grained control over flaked tests #1314