onsi / gomega

Ginkgo's Preferred Matcher Library
http://onsi.github.io/gomega/
MIT License
2.16k stars 281 forks source link

intermediate progress reports for asynchronous assertions (Eventually, Consistently, ...) #574

Open pohly opened 2 years ago

pohly commented 2 years ago

Suppose I have a complex Eventually assertion which can block for a long time. When running a test manually, I have to wait for the timeout before there is any indication of what went wrong. If nothing goes wrong, it's not obvious that the test was stuck in Eventually.

Example:

    gomega.Eventually(func(g gomega.Gomega) {
        var notRegistered []string
        for hostname, plugin := range d.Hosts {
            if !plugin.IsRegistered() {
                notRegistered = append(notRegistered, hostname)
            }
        }
        sort.Strings(notRegistered)
        g.Expect(notRegistered).To(gomega.BeEmpty(), "hosts where the plugin has not been registered yet")
    }).WithTimeout(time.Minute).WithPolling(time.Second).Should(gomega.Succeed())

One possible solution would be to wrap this in ginkgo.By and extend ginkgo.By so that it produces output at regular intervals, similar to the warning about slow tests:

     ginkgo.By("waiting for plugin registration", func() {
           gomega.Eventually...
     })

Possible output:

STEP: waiting for plugin registration <time>
STEP still running: waiting for plugin registration <time>
...

The interval for when to print the "still running" part would have to be configurable, similar to --slow-spec-threshold: --slow-step-threshold?

That may be a useful enhancement by itself, but doesn't address the "why does it not complete" part. For that, gomega itself would have to have a way to report the intermediate failure. For the special case above, a wrapper around the callback can dump failures at regular intervals, but that is not a generic solution.

pohly commented 2 years ago

See https://github.com/kubernetes/kubernetes/pull/112013 for an implementation outside of ginkgo and gomega.

onsi commented 2 years ago

hey @pohly

first off:

If nothing goes wrong, it's not obvious that the test was stuck in Eventually.

sounds like perhaps the timeout is too high? Is this the sort of case where "usually this passes in 3 seconds, but sometimes it takes 50+ seconds"? If so I'd honestly suggest dropping down the timeout (or making it configurable depending on location (e.g. longer on CI, shorter when running locally) or context (e.g. longer if I'm running lots of specs in parallel and straining a target cluster; shorter if I'm just focusing one spec). In general failing faster seems better to me than adding reporting verbosity.

With that said, I can imagine doing what you're asking for like this:

gomega.Eventually(func() {...}).WithTimeout(...).WithPolling(...).CallIfStillFailing(func(failure string) {}, 5*time.Second).Should(...)

You would then need to supply your own

func(failure string) {
    GinkgoWriter.Println(failure)
}

the reason being that Gomega doesn't know about Ginkgo/GinkgoWriter/By - so you would have to provide that bridge code yourself.

I hesitate to add this complexity, though. You could accomplish something similar by adding a GinkgoWriter.Println() in the polled function, though I appreciate it will be hard to get the error message that way.

pohly commented 2 years ago

sounds like perhaps the timeout is too high?

We have to have very high timeouts in Kubernetes E2E testing because sometimes an operation simply takes longer on a busy cluster. If we make the timeouts too close to what the duration normally is, then we get too many test flakes. I agree that it would be nicer to have better control over the timeouts, but we have too many developers who write tests and not enough reviewers who can check that the tests are written with such issues in mind.

I hesitate to add this complexity, though. You could accomplish something similar by adding a GinkgoWriter.Println() in the polled function, though I appreciate it will be hard to get the error message that way.

Putting it into the called function means writing extra code, which most developers won't do. We need something very simple.

What do you think of the proposal in https://github.com/kubernetes/kubernetes/pull/112013?