kyma-project / kyma

Kyma is an opinionated set of Kubernetes-based modular building blocks, including all necessary capabilities to develop and run enterprise-grade cloud-native applications.
https://kyma-project.io
Apache License 2.0
1.52k stars 405 forks source link

Failing periodic kyma-upgrade-gardener-kyma2-to-main-reconciler-main job #14587

Closed Sawthis closed 2 years ago

Sawthis commented 2 years ago

Description

Periodic kyma-upgrade-gardener-kyma2-to-main-reconciler-main job failed 2 times in last 24h counting from 14.06.2022 2PM.

Also https://status.build.kyma-project.io/?job=kyma-upgrade-gardener-kyma2-minor-versions is failing often for different reasons, mostly related to timeouts

nachtmaar commented 2 years ago

Example failures:

  Upgrade test preparation
[ERROR] Error: Wait for VirtualService commerce-mock timeout (20000 ms)
    1) CommerceMock test fixture should be ready

  0 passing (2m)
  1 failing

  1) Upgrade test preparation
       CommerceMock test fixture should be ready:
     Error: Wait for VirtualService commerce-mock timeout (20000 ms)
      at Timeout._onTimeout (utils/index.js:337:20)
      at listOnTimeout (internal/timers.js:557:17)
      at processTimers (internal/timers.js:500:7)
nachtmaar commented 2 years ago

The problem is happening on multiple pipelines

jakobmoellerdev commented 2 years ago

Hey there, whats the status of this ticket? https://storage.googleapis.com/kyma-prow-logs/logs/kyma-upgrade-gardener-kyma2-to-main-reconciler-main/1541300480955650048/build-log.txt still shows failures

a-thaler commented 2 years ago

Added https://status.build.kyma-project.io/?job=kyma-upgrade-gardener-kyma2-minor-versions to the description as it shows same symptoms

veichtj commented 2 years ago

In some of the last failing runs there is something wrong with the patching of istios MutatingWebhook, going to have a look at this.

veichtj commented 2 years ago

update: the last 8 runs (one failed with gardener provisioning timeout) did not fail based on the problem mentioning something with istio. Overall it does not look like it is related to Istio himself rather then other connection issues, as cluster-essentials also seem to fail. We will continue to have a look at it today, and paste updates here.

kasiakepka commented 2 years ago

Hi, I've raised this issue with Gardener. https://github.tools.sap/kubernetes-live/issues-live/issues/1917 Feel free to update it.

kasiakepka commented 2 years ago

Hi, I've managed to capture some logs from Gardener cluster. ProwJob Logs: https://status.build.kyma-project.io/view/gs/kyma-prow-logs/logs/kyma-upgrade-gardener-kyma2-to-main-reconciler-main/1551839829124190208

Your Action is required
This error is flagged as user error which indicates that no Gardener operator action is required. Please read the error message carefully and take action.
A misconfigured webhook prevents Gardener from performing operations. Please resolve this as this can lead to required actions not beeing performed which will eventually turn the cluster into an error state.
[Best practises](https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/#best-practices-and-warnings)
MutatingWebhookConfiguration "istio-sidecar-injector" is problematic: webhook "auto.sidecar-injector.istio.io" with failurePolicy "Fail" and 10s timeout might prevent worker nodes from properly joining the shoot cluster
Last Error DaemonSet "kube-system/kube-proxy-cpu-worker-v1.23.8" is unhealthy: too many unavailable pods found (1/2)
Your hibernation schedule may not have any effect: MutatingWebhookConfiguration "istio-sidecar-injector" is problematic: webhook "auto.sidecar-injector.istio.io" with failurePolicy "Fail" and 10s timeout might prevent worker nodes from properly joining the shoot cluster
Maintenance precondition check failed. Gardener may be unable to perform required actions during maintenance: MutatingWebhookConfiguration "istio-sidecar-injector" is problematic: webhook "auto.sidecar-injector.istio.io" with failurePolicy "Fail" and 10s timeout might prevent worker nodes from properly joining the shoot cluster
kasiakepka commented 2 years ago

Similar errors to Error: Wait for VirtualService commerce-mock timeout (20000 ms) were observed on many pipelines. Goats managed to fix it or at least improve the situation with: https://github.com/kyma-project/kyma/issues/15113