[Flaking Test] [sig-api-machinery] CustomResourceConversionWebhook [Privileged:ClusterAdmin] should be able to convert a non homogeneous list of CRs [Conformance]

pacoxu commented 4 months ago

Which jobs are flaking?

https://storage.googleapis.com/k8s-triage/index.html?test=should%20be%20able%20to%20convert%20a%20non%20homogeneous%20list%20of%20CRs&xjob=calico

ci-kubernetes-e2e-capz-master-windows
ci-kubernetes-cloud-provider-kind-conformance-parallel
ci-kubernetes-e2e-kubeadm-kinder-rootless-latest
ci-kubernetes-e2e-capz-master-windows-hyperv

Which tests are flaking?

[sig-api-machinery] CustomResourceConversionWebhook [Privileged:ClusterAdmin] should be able to convert a non homogeneous list of CRs [Conformance]

Since when has it been flaking?

storage.googleapis.com shows it flaked for a long period.

Testgrid link

https://testgrid.k8s.io/sig-release-master-informing#capz-windows-master

Reason for failure (if possible)

https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/ci-kubernetes-e2e-capz-master-windows/1766944844321656832

STEP: Verifying the service has paired with the endpoint - k8s.io/kubernetes/test/e2e/apimachinery/crd_conversion_webhook.go:345 @ 03/10/24 22:08:00.114
I0310 22:08:01.114971 2463 util.go:427] Waiting for amount of service:e2e-test-crd-conversion-webhook endpoints to be 1
< Exit [BeforeEach] [sig-api-machinery] CustomResourceConversionWebhook [Privileged:ClusterAdmin] - k8s.io/kubernetes/test/e2e/apimachinery/crd_conversion_webhook.go:126 @ 03/10/24 22:08:01.317 (8.596s)
> Enter [It] should be able to convert a non homogeneous list of CRs [Conformance] - k8s.io/kubernetes/test/e2e/apimachinery/crd_conversion_webhook.go:177 @ 03/10/24 22:08:01.317
I0310 22:08:01.317904 2463 util.go:506] >>> kubeConfig: /home/prow/go/src/sigs.k8s.io/windows-testing/capz/capz-conf-y1unem.kubeconfig
I0310 22:08:31.630253 2463 crd_conversion_webhook.go:501] error waiting for conversion to succeed during setup: Post "https://capz-conf-y1unem-4d42d29d.uksouth.cloudapp.azure.com:6443/apis/stable.example.com/v2/namespaces/crd-webhook-5821/e2e-test-crd-webhook-7762-crds": context deadline exceeded
I0310 22:08:31.630417 2463 crd_conversion_webhook.go:486] Unexpected error: 
    <context.deadlineExceededError>: 
    context deadline exceeded

        {}
[FAILED] context deadline exceeded
In [It] at: k8s.io/kubernetes/test/e2e/apimachinery/crd_conversion_webhook.go:486 @ 03/10/24 22:08:31.63
< Exit [It] should be able to convert a non homogeneous list of CRs [Conformance] - k8s.io/kubernetes/test/e2e/apimachinery/crd_conversion_webhook.go:177 @ 03/10/24 22:08:31.63 (30.313s)

{ failed [FAILED] context deadline exceeded
In [It] at: k8s.io/kubernetes/test/e2e/apimachinery/crd_conversion_webhook.go:486 @ 03/10/24 22:08:31.63
}

Anything else we need to know?

some issues that may be related: https://github.com/kubernetes/kubernetes/issues/93705

Relevant SIG(s)

/sig api-machinery /sig windows see this in a windows ci board, but may not be related. Add the sig for triage.

jsturtevant commented 4 months ago

I looked into this today for sig-windows. It appears the test fails when it cannot reach the API server. I can only find one instance of it failing for sig-windows main job, the hyper-v jobs have known networking issues and hence the failure.

It may be a timing issues since we are hitting this block https://github.com/kubernetes/kubernetes/blob/634fc1b4836b3a500e0d715d71633ff67690526a/test/e2e/apimachinery/crd_conversion_webhook.go#L499-L502

jsturtevant commented 4 months ago

Looking at the other non-windows failures it seems like mostly occurs with many test failures where the API Server is not reachable. As an example:

https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/ci-kubernetes-local-e2e/1764995137261277184

jsturtevant commented 4 months ago

The call to Create got stuck and only made a single call instead of many over the 30 second timeout. See the logs:

0310 22:08:01.317904 2463 util.go:506] >>> kubeConfig: /home/prow/go/src/sigs.k8s.io/windows-testing/capz/capz-conf-y1unem.kubeconfig
I0310 22:08:31.630253 2463 crd_conversion_webhook.go:501] error waiting for conversion to succeed during setup: Post "https://capz-conf-y1unem-4d42d29d.uksouth.cloudapp.azure.com:6443/apis/stable.example.com/v2/namespaces/crd-webhook-5821/e2e-test-crd-webhook-7762-crds": context deadline exceeded
I0310 22:08:31.630417 2463 crd_conversion_webhook.go:486] Unexpected error: 
    <context.deadlineExceededError>: 
    context deadline exceeded

        {}

https://github.com/kubernetes/kubernetes/blob/634fc1b4836b3a500e0d715d71633ff67690526a/test/e2e/apimachinery/crd_conversion_webhook.go#L497 and the whole block timed out

jiahuif commented 4 months ago

/assign @jsturtevant Could you continue working on this issue? Thank you. /triage accepted

kubernetes / kubernetes