Open aecay opened 11 months ago
Thanks you so much for your suggestion, @aecay!
This idea was discussed before (#85).
Using the general Expect
is a valid usage for real use-cases - for cases we don't wan't to retry, but to exit with error immediately, e.g. in case of a non-recoverable error. The ginkgolinter has no way to distinguish between these cases.
I tend to close the issue. WDYT?
Apologies, I missed the prior discussion.
I have never needed to bail out early from an Eventually
in the way that you describe. But I can see how the code you describe would have that effect.
But gomega provides StopTrying(...).Now()
to accomplish the same thing: link. I would tend to say that if you need to bail out early, you should use StopTrying
since it is more explicit, and can't be mistaken for a bug (calling Expect
when you should call g.Expect
).
I guess this is getting into stylistic territory (rather than strictly finding bugs), since global Expect
is equally valid as a way of signaling an early exit. But IMO it's worthwhile to provide a way for the linter to catch global Expect bugs, even if it requires stylistic concessions elsewhere in the code. Potentially the check could not be enabled by default though, to avoid annoyingly triggering lint errors in codebases that use another style. WDYT?
I was about to file an issue about this when I found that there is already one (okay, two...). This misuse of an assertion inside a gomega.Eventually
and gomega.Consistently
is by far the most common mistake that developers make in Kubernetes. The expectation clearly is that a failed assertion is caught and triggers a retry.
Instead, the test gets marked as failed and ginkgo.Fail
raises a panic. I am not sure whether gomega.Eventually/Consistently
catches that and then retries. Does it?
I see two solutions:
Let's discuss here instead of splitting the discussion. cc @onsi
The linter could annotate functions which call ginkgo.Fail
. This would cover Gomega and Kubernetes' e2e/framework.ExpectNoError
. It then needs to detect functions that get called by gomega.Eventually/Consistently
and then check those whether any function annotated as "calls ginkgo.Fail
" is called. Non-trivial... If it can be made to work, developers still need to fix their code.
ginkgo.Fail
could check for "called by gomega.Eventually/Consistently
" during unwinding. Then instead of marking the test as failed, it could panic with enough information for gomega.Eventually/Consistently
to capture a nice failure explanation in case that all retries fail. Perhaps a simple string is enough; alternatively, a type implementing a well-known AssertionFailed() string
method could be used. Then Gomega can do type assertion with a locally defined interface to check for this special panic, without depending on Ginkgo source code.
+1, as a dev would love to be able to more easily identify that a test failure was caused by incorrect Eventually
usage.
This should really get fixed in Ginkgo/Gomega. The strict separation was a cute decision 11 years ago but this pain point is pretty terrible.
Gomega captures the panic. I bet there's a way to annotate it so that (a) Gomega knows it came from Ginkgo and (b) can extract a callback from the panic object to tell Ginkgo to ignore the failure.
I'll dig into it - though I'm not confident about timelines. Let me see if I can make time for it soon...
I don't think the problem is the implementation, at least not in the linter. This is not trivial, but doable.
The main problem is that the linter, for several reasons, does not know how to read the developer mind, yet. ;)
Global Expect
s within async gomega methods, are often done intentionally, in order to bail out in case of unrecoverable failures.
The gomega current solution is to replace the assertion (Expect...To), with regular golang code, and call StopTrying(...).Now()
if needed. This is hard to make this in the linter, mostly because the test may use any matcher.
WDYT about adding something to gomega, so we'll could have the option to bail out, but using the existing matchers. e.g.
Eventually(func(g Gomega){
...
ExpectOrExit(vm.Status).ToNot(Equal("Failed"))
...
g.Expect(somethingElse).To(BeSomethingElse())
...
}).WithPolling(...).WithTimeout(...).Should(Succeed())
or maybe
Eventually(func(g Gomega){
...
Expect(vm.Status).ToNotOrExit(Equal("Failed"), "VM must not be in Failed status")
...
g.Expect(somethingElse).To(BeSomethingElse())
...
}).WithPolling(...).WithTimeout(...).Should(Succeed())
Global Expects within async gomega methods, are often done intentionally, in order to bail out in case of unrecoverable failures.
But that doesn't work, does it? The async Gomega methods keep trying and eventually time out. Only then is the test really done and has failed.
What I and @onsi suggested would change that slightly: if such an assertion fails only once, the next retry may succeed and the test passes. This is a admittedly a change of behavior, but as the former behavior is basically undefined and/or falls under "don't do this" I think it's okay.
But that doesn't work, does it? The async Gomega methods keep trying and eventually time out. Only then is the test really done and has failed.
Yes it is working. Using the global omega bails out early, and sometimes this is what we want.
@pohly here is an example:
var _ = Describe("global Gomega vs. gomega var", func() {
It("should bail out immediately", func() {
Eventually(func() error {
Expect(2).ToNot(Equal(2))
return nil
}).WithTimeout(10 * time.Second).WithPolling(time.Second).Should(Succeed())
})
It("should fail after 10 seconds", func() {
Eventually(func(g Gomega) {
g.Expect(2).ToNot(Equal(2))
}).WithTimeout(10 * time.Second).WithPolling(time.Second).Should(Succeed())
})
})
When running these tests, I can see the timing in the output:
• [FAILED] [0.000 seconds] <===== global gomega
global Gomega vs. gomega var [It] should bail out immediately
.../main_test.go:13
[FAILED] Expected
<int>: 2
not to equal
<int>: 2
In [It] at: .../main_test.go:15 @ 11/25/24 08:11:39.646
------------------------------
• [FAILED] [10.000 seconds] <===== gomega var
global Gomega vs. gomega var [It] should fail after 10 seconds
.../main_test.go:20
[FAILED] Timed out after 10.000s.
The function passed to Eventually failed at .../main_test.go:22 with:
Expected
<int>: 2
not to equal
<int>: 2
In [It] at: .../main_test.go:23 @ 11/25/24 08:11:49.646
• [FAILED] [0.000 seconds] <===== global gomega
@onsi said that gomega.Eventually
would catch the panic and I assumed that it would then retry, but your are right. It just passes "normal" panics through, so this aborts immediately:
So as much as I would like gomega.Expect
inside gomega.Eventually/Consistently
to work as expected (sic!) by most Kubernetes developers, it would be a breaking change for others. Perhaps make it configurable in Gomega?
If a linter check can be implemented, then it could be kept disabled by default. If enabled, it becomes a per-project policy that global asserts should be avoided due to the ambiguity, with either StopPollingNow
, func(g Gomega)
, or moving assertions to the Eventually
matcher as replacement.
I guess it's doable, but not so simple with the current code of the linter.
It's a bit challenge to understand the scope of a gomega assertion expression, to understand if it's in a function deceleration that is an argument of a async-gomega expression. And this would be the easy part. because sometimes these functions are declared elsewhere, even in a different package.
Anyway, if someone have some time for it, then some help is welcome.
I have an initial idea how to approach it - not sure it will work.
You probably know best how to do this :sweat_smile:
If you don't get to it, then please ping the issue again and I'll check whether I can make some time for it.
hmm. sorry everyone, I was clearly misremembering. the existing behavior is surprising and unfortunate to me and I would consider it a bug. it's a conversation that's out of scope on this linter issue. in retrospect a better design would have been to avoid the func(g Gomega)
pattern and instead capture Gomega failures (really any invocation of ginkgo.Fail
) and retry - and encourage the user to explicitly use StopTrying
to abort Eventually
.
Given that some folks are relying on the current behavior I don't think there's a straightforward way to change things without breaking backward compatibility. I could add a configuration but now we're just layering on complexity instead of fixing the underlying issue. The other alternative is to build a roadmap for Gomega 2.0 and put this breaking change on it.
@pohly if you'd like to explore making this configurable further please open an issue on Gomega.
@onsi: I agree with your assessment. Let's wait with the configuration option because even if it existed, I would have the problem that I can't be sure whether it is safe to enable without the linter check and going through all code locations that it points out.
Is your feature request related to a problem? Please describe.
There are two ways of passing functions to gomega's
Eventually
. One is to pass a function that returns values, and assert on the return values. The other is to pass a function that does assertions itself (link). In the latter case, the function should takeg Gomega
as an argument and callg.Expect
in order to assert.Failing to do this will assert against the "global" gomega object -- which will only pass (or fail) the first time and will not continue to poll the assertions. This is a bug, and it can be quite timing dependent at runtime (e.g. if your local dev machine is fast enough that the async assertion passes the first time, but the machine where CI runs is slower and so the first call to the assertion fails).
Describe the solution you'd like
We can catch this in the linter, by looking for code of the form:
This should even be auto-fixable (if the
func
is 0 arity at least) by adding theg Gomega
argument and converting all the calls toExpect
tog.Expect
.I don't know if the linter can see the source of a function that is passed by name to Eventually (like
Eventually(someFuncThatMightCallExpect)
) -- if it can, we can find erroneous global calls toExpect
in that function as well. But even catching the error when the function is a literal, as in the earlier example, would be helpful.Describe alternatives you've considered
I don't know of any other potential way to prevent or detect this error before runtime.
Additional context
n/a