kubernetes / kubernetes

Production-Grade Container Scheduling and Management
Apache License 2.0
108.47k stars 38.91k forks source link

[Flaking Test] [sig-api-machinery] CustomResourceConversionWebhook [Privileged:ClusterAdmin] should be able to convert a non homogeneous list of CRs [Conformance] #123851

Open pacoxu opened 4 months ago

pacoxu commented 4 months ago

Which jobs are flaking?


Which tests are flaking?

Since when has it been flaking?

storage.googleapis.com shows it flaked for a long period.

Testgrid link


Reason for failure (if possible)


STEP: Verifying the service has paired with the endpoint - k8s.io/kubernetes/test/e2e/apimachinery/crd_conversion_webhook.go:345 @ 03/10/24 22:08:00.114
I0310 22:08:01.114971 2463 util.go:427] Waiting for amount of service:e2e-test-crd-conversion-webhook endpoints to be 1
< Exit [BeforeEach] [sig-api-machinery] CustomResourceConversionWebhook [Privileged:ClusterAdmin] - k8s.io/kubernetes/test/e2e/apimachinery/crd_conversion_webhook.go:126 @ 03/10/24 22:08:01.317 (8.596s)
> Enter [It] should be able to convert a non homogeneous list of CRs [Conformance] - k8s.io/kubernetes/test/e2e/apimachinery/crd_conversion_webhook.go:177 @ 03/10/24 22:08:01.317
I0310 22:08:01.317904 2463 util.go:506] >>> kubeConfig: /home/prow/go/src/sigs.k8s.io/windows-testing/capz/capz-conf-y1unem.kubeconfig
I0310 22:08:31.630253 2463 crd_conversion_webhook.go:501] error waiting for conversion to succeed during setup: Post "https://capz-conf-y1unem-4d42d29d.uksouth.cloudapp.azure.com:6443/apis/stable.example.com/v2/namespaces/crd-webhook-5821/e2e-test-crd-webhook-7762-crds": context deadline exceeded
I0310 22:08:31.630417 2463 crd_conversion_webhook.go:486] Unexpected error: 
    context deadline exceeded

[FAILED] context deadline exceeded
In [It] at: k8s.io/kubernetes/test/e2e/apimachinery/crd_conversion_webhook.go:486 @ 03/10/24 22:08:31.63
< Exit [It] should be able to convert a non homogeneous list of CRs [Conformance] - k8s.io/kubernetes/test/e2e/apimachinery/crd_conversion_webhook.go:177 @ 03/10/24 22:08:31.63 (30.313s)

{ failed [FAILED] context deadline exceeded
In [It] at: k8s.io/kubernetes/test/e2e/apimachinery/crd_conversion_webhook.go:486 @ 03/10/24 22:08:31.63

Anything else we need to know?

some issues that may be related: https://github.com/kubernetes/kubernetes/issues/93705

Relevant SIG(s)

/sig api-machinery /sig windows see this in a windows ci board, but may not be related. Add the sig for triage.

jsturtevant commented 4 months ago

I looked into this today for sig-windows. It appears the test fails when it cannot reach the API server. I can only find one instance of it failing for sig-windows main job, the hyper-v jobs have known networking issues and hence the failure.

It may be a timing issues since we are hitting this block https://github.com/kubernetes/kubernetes/blob/634fc1b4836b3a500e0d715d71633ff67690526a/test/e2e/apimachinery/crd_conversion_webhook.go#L499-L502

jsturtevant commented 4 months ago

Looking at the other non-windows failures it seems like mostly occurs with many test failures where the API Server is not reachable. As an example:


jsturtevant commented 4 months ago

The call to Create got stuck and only made a single call instead of many over the 30 second timeout. See the logs:

0310 22:08:01.317904 2463 util.go:506] >>> kubeConfig: /home/prow/go/src/sigs.k8s.io/windows-testing/capz/capz-conf-y1unem.kubeconfig
I0310 22:08:31.630253 2463 crd_conversion_webhook.go:501] error waiting for conversion to succeed during setup: Post "https://capz-conf-y1unem-4d42d29d.uksouth.cloudapp.azure.com:6443/apis/stable.example.com/v2/namespaces/crd-webhook-5821/e2e-test-crd-webhook-7762-crds": context deadline exceeded
I0310 22:08:31.630417 2463 crd_conversion_webhook.go:486] Unexpected error: 
    context deadline exceeded


https://github.com/kubernetes/kubernetes/blob/634fc1b4836b3a500e0d715d71633ff67690526a/test/e2e/apimachinery/crd_conversion_webhook.go#L497 and the whole block timed out

jiahuif commented 4 months ago

/assign @jsturtevant Could you continue working on this issue? Thank you. /triage accepted