knative / client

Knative developer experience, docs, reference Knative CLI implementation
Apache License 2.0
354 stars 261 forks source link

Flaky E2E test: e2e/TestDomain - create domain with TLS #1873

Closed dsimansk closed 8 months ago

dsimansk commented 1 year ago

The e2e tests are intermittently failing on e2e/TestDomain. In a specific https test case. When the URL should contain https but is empty. It's most likely a timing issue with empty value being checked prematurely.

There's a wait loop to wait for URL to be populated before asserting it. It might be not working as expected.
https://github.com/knative/client/blob/2d0415a7951b99130ccb514fa0e3c5343338225c/test/e2e/domain_mapping_test.go#L136C1-L144C3

Error log: https://prow.knative.dev/view/gs/knative-prow/logs/continuous_client_main_periodic/1710944161479266304#

/kind good-first-issue

xiangpingjiang commented 1 year ago

/assign

dsimansk commented 1 year ago

The original issue is not fixed. We need to further investigate & verify the wait loop is working as expected and maybe fix ti.

@xiangpingjiang I'll remove your previous assignment. But feel free to assign again and do a second pass if you are still interested.

xiangpingjiang commented 1 year ago

The original issue is not fixed. We need to further investigate & verify the wait loop is working as expected and maybe fix ti.

@xiangpingjiang I'll remove your previous assignment. But feel free to assign again and do a second pass if you are still interested.

@dsimansk Did you find that e2e/TestDomain not pass after https://github.com/knative/client/pull/1888 merged?

dsimansk commented 1 year ago

The original issue is not fixed. We need to further investigate & verify the wait loop is working as expected and maybe fix ti. @xiangpingjiang I'll remove your previous assignment. But feel free to assign again and do a second pass if you are still interested.

@dsimansk Did you find that e2e/TestDomain not pass after #1888 merged?

Yes. E.g.: https://prow.knative.dev/view/gs/knative-prow/pr-logs/pull/knative_client/1901/integration-tests-latest-release_client_main/1729841947746504704

xiangpingjiang commented 1 year ago

/assign

dsimansk commented 1 year ago

The suspicious part in test log is the following. See age: field is 1s, URL is still empty, but the test is already failed with missing string error.

There's still possibility to a short timeout (e.g. 3 - 5s) between create and verify operations. But I'd rather check first if the wait loop is working as expected.

domain_mapping_test.go:153: assertion failed: 
        Actual output: Name:       newdomain.com
        Namespace:  kne2etests21
        Age:        1s

        URL:  

        Reference:     
          APIVersion:  serving.knative.dev/v1
          Kind:        Service
          Name:        hello

        Conditions:  
          OK TYPE    AGE REASON

        Missing strings: https://newdomain.com/
Ipppoooo commented 11 months ago

It could be a timing related issue. This may work better for the for loop

func domainDescribe(r *test.KnRunResultCollector, domainName string, tls bool) {
    k := test.NewKubectl(r.KnTest().Kn().Namespace())

    // Wait for Domain Mapping URL to be populated
    var url string
    timeout := time.After(30 * time.Second) // Adjust the timeout as needed

    for {
        select {
        case <-time.After(time.Second):
            out, err := k.Run("get", "domainmapping", domainName, "-o=jsonpath='{.status.url}'")
            assert.NilError(r.T(), err)

            if len(out) > 0 {
                url = out
                break
            }
        case <-timeout:
            assert.Fail(r.T(), "Timed out waiting for the domain mapping URL to be populated")
            return
        }
    }

    out := r.KnTest().Kn().Run("domain", "describe", domainName)
    r.AssertNoError(out)

    if tls {
        url = "https://" + domainName
    } else {
        url = "http://" + domainName
    }

    assert.Assert(r.T(), util.ContainsAll(out.Stdout, "Name", "Namespace", "URL", "Service", url))
}

By introducing this more robust waiting mechanism, you can mitigate timing issues and reduce the likelihood of intermittent test failures.

rhuss commented 10 months ago

Thanks @Ipppoooo , i think its worth a try. Maybe don't retry every second as this will cause 30 calls if this is a non-recoverable error. Maybe use e.g. 5seconds or an exponential backoff. Fancy to send in a PR ?

xiangpingjiang commented 10 months ago

hello @rhuss I think the reason maybe the break condition in for loop is len(out) > 0, but the assert condition is util.ContainsAll(out.Stdout, "Name", "Namespace", "URL", "Service", url).

It's different conditions.

What do you think?