nunnatsa / ginkgolinter

golang linter for ginkgo and gomega
MIT License
27 stars 6 forks source link

[Enhancement] detect incorrect global assertions in `Eventually` #127

Open aecay opened 11 months ago

aecay commented 11 months ago

Is your feature request related to a problem? Please describe.

There are two ways of passing functions to gomega's Eventually. One is to pass a function that returns values, and assert on the return values. The other is to pass a function that does assertions itself (link). In the latter case, the function should take g Gomega as an argument and call g.Expect in order to assert.

Failing to do this will assert against the "global" gomega object -- which will only pass (or fail) the first time and will not continue to poll the assertions. This is a bug, and it can be quite timing dependent at runtime (e.g. if your local dev machine is fast enough that the async assertion passes the first time, but the machine where CI runs is slower and so the first call to the assertion fails).

Describe the solution you'd like

We can catch this in the linter, by looking for code of the form:

Eventually(func () {
    // ...
    Expect(...) // (or Ω) <-- this is what the linter should alert on
}).Should(...)

This should even be auto-fixable (if the func is 0 arity at least) by adding the g Gomega argument and converting all the calls to Expect to g.Expect.

I don't know if the linter can see the source of a function that is passed by name to Eventually (like Eventually(someFuncThatMightCallExpect)) -- if it can, we can find erroneous global calls to Expect in that function as well. But even catching the error when the function is a literal, as in the earlier example, would be helpful.

Describe alternatives you've considered

I don't know of any other potential way to prevent or detect this error before runtime.

Additional context

n/a

nunnatsa commented 11 months ago

Thanks you so much for your suggestion, @aecay!

This idea was discussed before (#85).

Using the general Expect is a valid usage for real use-cases - for cases we don't wan't to retry, but to exit with error immediately, e.g. in case of a non-recoverable error. The ginkgolinter has no way to distinguish between these cases.

I tend to close the issue. WDYT?

aecay commented 11 months ago

Apologies, I missed the prior discussion.

I have never needed to bail out early from an Eventually in the way that you describe. But I can see how the code you describe would have that effect.

But gomega provides StopTrying(...).Now() to accomplish the same thing: link. I would tend to say that if you need to bail out early, you should use StopTrying since it is more explicit, and can't be mistaken for a bug (calling Expect when you should call g.Expect).

I guess this is getting into stylistic territory (rather than strictly finding bugs), since global Expect is equally valid as a way of signaling an early exit. But IMO it's worthwhile to provide a way for the linter to catch global Expect bugs, even if it requires stylistic concessions elsewhere in the code. Potentially the check could not be enabled by default though, to avoid annoyingly triggering lint errors in codebases that use another style. WDYT?

pohly commented 2 days ago

I was about to file an issue about this when I found that there is already one (okay, two...). This misuse of an assertion inside a gomega.Eventually and gomega.Consistently is by far the most common mistake that developers make in Kubernetes. The expectation clearly is that a failed assertion is caught and triggers a retry.

Instead, the test gets marked as failed and ginkgo.Fail raises a panic. I am not sure whether gomega.Eventually/Consistently catches that and then retries. Does it?

I see two solutions:

Let's discuss here instead of splitting the discussion. cc @onsi

The linter could annotate functions which call ginkgo.Fail. This would cover Gomega and Kubernetes' e2e/framework.ExpectNoError. It then needs to detect functions that get called by gomega.Eventually/Consistently and then check those whether any function annotated as "calls ginkgo.Fail" is called. Non-trivial... If it can be made to work, developers still need to fix their code.

ginkgo.Fail could check for "called by gomega.Eventually/Consistently" during unwinding. Then instead of marking the test as failed, it could panic with enough information for gomega.Eventually/Consistently to capture a nice failure explanation in case that all retries fail. Perhaps a simple string is enough; alternatively, a type implementing a well-known AssertionFailed() string method could be used. Then Gomega can do type assertion with a locally defined interface to check for this special panic, without depending on Ginkgo source code.

jackfrancis commented 1 day ago

+1, as a dev would love to be able to more easily identify that a test failure was caused by incorrect Eventually usage.

onsi commented 1 day ago

This should really get fixed in Ginkgo/Gomega. The strict separation was a cute decision 11 years ago but this pain point is pretty terrible.

Gomega captures the panic. I bet there's a way to annotate it so that (a) Gomega knows it came from Ginkgo and (b) can extract a callback from the panic object to tell Ginkgo to ignore the failure.

I'll dig into it - though I'm not confident about timelines. Let me see if I can make time for it soon...

nunnatsa commented 1 day ago

I don't think the problem is the implementation, at least not in the linter. This is not trivial, but doable.

The main problem is that the linter, for several reasons, does not know how to read the developer mind, yet. ;)

Global Expects within async gomega methods, are often done intentionally, in order to bail out in case of unrecoverable failures.

The gomega current solution is to replace the assertion (Expect...To), with regular golang code, and call StopTrying(...).Now() if needed. This is hard to make this in the linter, mostly because the test may use any matcher.

WDYT about adding something to gomega, so we'll could have the option to bail out, but using the existing matchers. e.g.

Eventually(func(g Gomega){
    ...
    ExpectOrExit(vm.Status).ToNot(Equal("Failed"))
    ...
    g.Expect(somethingElse).To(BeSomethingElse())
    ...
}).WithPolling(...).WithTimeout(...).Should(Succeed())

or maybe

Eventually(func(g Gomega){
    ...
    Expect(vm.Status).ToNotOrExit(Equal("Failed"), "VM must not be in Failed status")
    ...
    g.Expect(somethingElse).To(BeSomethingElse())
    ...
}).WithPolling(...).WithTimeout(...).Should(Succeed())
pohly commented 22 hours ago

Global Expects within async gomega methods, are often done intentionally, in order to bail out in case of unrecoverable failures.

But that doesn't work, does it? The async Gomega methods keep trying and eventually time out. Only then is the test really done and has failed.

What I and @onsi suggested would change that slightly: if such an assertion fails only once, the next retry may succeed and the test passes. This is a admittedly a change of behavior, but as the former behavior is basically undefined and/or falls under "don't do this" I think it's okay.

nunnatsa commented 18 hours ago

But that doesn't work, does it? The async Gomega methods keep trying and eventually time out. Only then is the test really done and has failed.

Yes it is working. Using the global omega bails out early, and sometimes this is what we want.

nunnatsa commented 8 hours ago

@pohly here is an example:

var _ = Describe("global Gomega vs. gomega var", func() {
    It("should bail out immediately", func() {
        Eventually(func() error {
            Expect(2).ToNot(Equal(2))
            return nil
        }).WithTimeout(10 * time.Second).WithPolling(time.Second).Should(Succeed())
    })

    It("should fail after 10 seconds", func() {
        Eventually(func(g Gomega) {
            g.Expect(2).ToNot(Equal(2))
        }).WithTimeout(10 * time.Second).WithPolling(time.Second).Should(Succeed())
    })
})

When running these tests, I can see the timing in the output:

• [FAILED] [0.000 seconds] <===== global gomega
global Gomega vs. gomega var [It] should bail out immediately
.../main_test.go:13

  [FAILED] Expected
      <int>: 2
  not to equal
      <int>: 2
  In [It] at: .../main_test.go:15 @ 11/25/24 08:11:39.646
------------------------------
• [FAILED] [10.000 seconds] <===== gomega var
global Gomega vs. gomega var [It] should fail after 10 seconds
.../main_test.go:20

  [FAILED] Timed out after 10.000s.
  The function passed to Eventually failed at .../main_test.go:22 with:
  Expected
      <int>: 2
  not to equal
      <int>: 2
  In [It] at: .../main_test.go:23 @ 11/25/24 08:11:49.646
pohly commented 6 hours ago

• [FAILED] [0.000 seconds] <===== global gomega

@onsi said that gomega.Eventually would catch the panic and I assumed that it would then retry, but your are right. It just passes "normal" panics through, so this aborts immediately:

https://github.com/onsi/gomega/blob/4c964c6b4e8d0ccab972a25aff74ed28940979e5/internal/async_assertion.go#L320-L322

So as much as I would like gomega.Expect inside gomega.Eventually/Consistently to work as expected (sic!) by most Kubernetes developers, it would be a breaking change for others. Perhaps make it configurable in Gomega?

pohly commented 4 hours ago

If a linter check can be implemented, then it could be kept disabled by default. If enabled, it becomes a per-project policy that global asserts should be avoided due to the ambiguity, with either StopPollingNow, func(g Gomega), or moving assertions to the Eventually matcher as replacement.

nunnatsa commented 4 hours ago

I guess it's doable, but not so simple with the current code of the linter.

It's a bit challenge to understand the scope of a gomega assertion expression, to understand if it's in a function deceleration that is an argument of a async-gomega expression. And this would be the easy part. because sometimes these functions are declared elsewhere, even in a different package.

Anyway, if someone have some time for it, then some help is welcome.

I have an initial idea how to approach it - not sure it will work.

pohly commented 4 hours ago

You probably know best how to do this :sweat_smile:

If you don't get to it, then please ping the issue again and I'll check whether I can make some time for it.

onsi commented 1 hour ago

hmm. sorry everyone, I was clearly misremembering. the existing behavior is surprising and unfortunate to me and I would consider it a bug. it's a conversation that's out of scope on this linter issue. in retrospect a better design would have been to avoid the func(g Gomega) pattern and instead capture Gomega failures (really any invocation of ginkgo.Fail) and retry - and encourage the user to explicitly use StopTrying to abort Eventually.

Given that some folks are relying on the current behavior I don't think there's a straightforward way to change things without breaking backward compatibility. I could add a configuration but now we're just layering on complexity instead of fixing the underlying issue. The other alternative is to build a roadmap for Gomega 2.0 and put this breaking change on it.

@pohly if you'd like to explore making this configurable further please open an issue on Gomega.

pohly commented 59 minutes ago

@onsi: I agree with your assessment. Let's wait with the configuration option because even if it existed, I would have the problem that I can't be sure whether it is safe to enable without the linter check and going through all code locations that it points out.