kyma-project / infrastructure-manager

Apache License 2.0
0 stars 10 forks source link

Runtime is Failed even if Gardener Shoot is "in progress" #346

Closed piotrmiskiewicz closed 2 months ago

piotrmiskiewicz commented 3 months ago

Description

Runtime has a state failed

status:
  conditions:
  - lastTransitionTime: "2024-08-16T13:06:19Z"
    message: 'Gardener API create error: shoots.core.gardener.cloud "keb-x-01" already
      exists'
    reason: GardenerErr
    status: "False"
    type: Provisioned
  state: Failed

but this shoot name was not used before

Expected result

Runtime with state "Ready"

Actual result

Runtime is failed even if gardener is still in progress

Steps to reproduce

Troubleshooting

piotrmiskiewicz commented 3 months ago

the Spec:


  security:
    administrators:
    - admin1@test.com
    - admin2@test.com
    networking:
      filter:
        egress:
          enabled: false
        ingress:
          enabled: false
  shoot:
    controlPlane:
      highAvailability:
        failureTolerance:
          type: node
    kubernetes:
      kubeAPIServer:
        oidcConfig:
          clientID: 9bd05ed7-a930-44e6-8c79-e6defeb7dec9
          groupsClaim: groups
          issuerURL: https://kymatest.accounts400.ondemand.com
          signingAlgs:
          - RS256
          usernameClaim: sub
          usernamePrefix: '-'
      version: "1.29"
    name: keb-x-01
    networking:
      nodes: 10.250.0.0/22
      pods: 10.96.0.0/13
      services: 10.104.0.0/13
      type: calico
    platformRegion: platform-region
    provider:
      type: aws
      workers:
      - machine:
          image:
            name: gardenlinux
            version: 1312.3.0
          type: m6i.large
        maxSurge: 1
        maxUnavailable: 0
        maximum: 5
        minimum: 4
        name: cpu-worker-0
        volume:
          size: 50Gi
          type: gp2
        zones:
        - eu-west-2b
    purpose: production
    region: eu-west-2
    secretBindingName: sap-aws-skr-dev-cust-00002-kyma-integration```
Disper commented 3 months ago

~Currently there is such shoot on DEV https://dashboard.garden.canary.k8s.ondemand.com/namespace/garden-kyma-dev/shoots/keb-x-01. @piotrmiskiewicz have you checked if there were no shoots with such name in https://dashboard.garden.canary.k8s.ondemand.com/namespace/garden-kyma-dev/shoots/ when your error has occured?~

Some more examples of shootname already exist - https://github.com/kyma-project/infrastructure-manager/issues/332#issuecomment-2296151273

jaroslaw-pieszka commented 3 months ago

Another attempts fail with messages

    message: 'Gardener API create error: shoots.core.gardener.cloud "bd0e0e5" already
      exists'
     message: 'Gardener API create error: shoots.core.gardener.cloud "ad0e0e5" already
      exists'

The first attempt was with shoot name generated during E2E tests. The latter was by changing shoot name in previously created Runtime CR - so there is extremely high probability the shoot name was not used before.

Disper commented 3 months ago

Logs from DEV connected to shoot bd0e0e5

Z    ERROR   reqID 53158     Failed to create new gardener Shoot     {"error": "Shoot.core.gardener.cloud \"bd0e0e5\" is invalid: spec.networking.type: Required value: networking type must be provided"}
2024-08-19T09:27:17Z    DEBUG   events  Gardener API create error: Shoot.core.gardener.cloud "bd0e0e5" is invalid: spec.networking.type: Required value: networking type must be provided: kcp-system/72a29ca1-65c7-4a5a-8191-a260adbc2de2  {"type": "Warning", "object": {"kind":"Runtime","namespace":"kcp-system","name":"72a29ca1-65c7-4a5a-8191-a260adbc2de2","uid":"7295b4b4-ab35-473c-8ae2-0c20d2e932a6","apiVersion":"infrastructuremanager.kyma-project.io/v1","resourceVersion":"4052677301"}, "reason": "GardenerErr"}
2024-08-19T09:27:36Z    ERROR   reqID 53162     Failed to create new gardener Shoot     {"error": "Shoot.core.gardener.cloud \"bd0e0e5\" is invalid: spec.networking.type: Required value: networking type must be provided"}
2024-08-19T09:27:55Z    ERROR   reqID 53166     Failed to create new gardener Shoot     {"error": "Shoot.core.gardener.cloud \"bd0e0e5\" is invalid: spec.networking.type: Required value: networking type must be provided"}
2024-08-19T09:28:12Z    ERROR   reqID 53170     Failed to create new gardener Shoot     {"error": "Shoot.core.gardener.cloud \"bd0e0e5\" is invalid: spec.networking.type: Required value: networking type must be provided"}
2024-08-19T09:28:32Z    ERROR   reqID 53174     Failed to create new gardener Shoot     {"error": "Shoot.core.gardener.cloud \"bd0e0e5\" is invalid: spec.networking.type: Required value: networking type must be provided"}
2024-08-19T09:28:55Z    ERROR   reqID 53178     Failed to create new gardener Shoot     {"error": "Shoot.core.gardener.cloud \"bd0e0e5\" is invalid: spec.networking.type: Required value: networking type must be provided"}
2024-08-19T09:29:17Z    ERROR   reqID 53182     Failed to create new gardener Shoot     {"error": "Shoot.core.gardener.cloud \"bd0e0e5\" is invalid: spec.networking.type: Required value: networking type must be provided"}
2024-08-19T09:29:36Z    ERROR   reqID 53186     Failed to create new gardener Shoot     {"error": "Shoot.core.gardener.cloud \"bd0e0e5\" is invalid: spec.networking.type: Required value: networking type must be provided"}
2024-08-19T09:29:59Z    ERROR   reqID 53190     Failed to create new gardener Shoot     {"error": "Shoot.core.gardener.cloud \"bd0e0e5\" is invalid: spec.networking.type: Required value: networking type must be provided"}
2024-08-19T09:30:21Z    ERROR   reqID 53194     Failed to create new gardener Shoot     {"error": "Shoot.core.gardener.cloud \"bd0e0e5\" is invalid: spec.networking.type: Required value: networking type must be provided"}
2024-08-19T09:30:41Z    ERROR   reqID 53198     Failed to create new gardener Shoot     {"error": "Shoot.core.gardener.cloud \"bd0e0e5\" is invalid: spec.networking.type: Required value: networking type must be provided"}
2024-08-19T09:31:03Z    ERROR   reqID 53202     Failed to create new gardener Shoot     {"error": "Shoot.core.gardener.cloud \"bd0e0e5\" is invalid: spec.networking.type: Required value: networking type must be provided"}
2024-08-19T09:31:21Z    ERROR   reqID 53206     Failed to create new gardener Shoot     {"error": "Shoot.core.gardener.cloud \"bd0e0e5\" is invalid: spec.networking.type: Required value: networking type must be provided"}
2024-08-19T09:31:41Z    ERROR   reqID 53210     Failed to create new gardener Shoot     {"error": "Shoot.core.gardener.cloud \"bd0e0e5\" is invalid: spec.networking.type: Required value: networking type must be provided"}
2024-08-19T09:32:01Z    ERROR   reqID 53214     Failed to create new gardener Shoot     {"error": "Shoot.core.gardener.cloud \"bd0e0e5\" is invalid: spec.networking.type: Required value: networking type must be provided"}
2024-08-19T09:32:21Z    ERROR   reqID 53218     Failed to create new gardener Shoot     {"error": "Shoot.core.gardener.cloud \"bd0e0e5\" is invalid: spec.networking.type: Required value: networking type must be provided"}
2024-08-19T09:32:41Z    ERROR   reqID 53222     Failed to create new gardener Shoot     {"error": "Shoot.core.gardener.cloud \"bd0e0e5\" is invalid: spec.networking.type: Required value: networking type must be provided"}
2024-08-19T09:33:02Z    ERROR   reqID 53226     Failed to create new gardener Shoot     {"error": "Shoot.core.gardener.cloud \"bd0e0e5\" is invalid: spec.networking.type: Required value: networking type must be provided"}
2024-08-19T09:33:13Z    INFO    reqID 53227     Gardener shoot for runtime initialised successfully     {"Name": "bd0e0e5", "Namespace": "garden-kyma-dev"}
2024-08-19T09:33:35Z    ERROR   reqID 53231     Failed to create new gardener Shoot     {"error": "shoots.core.gardener.cloud \"bd0e0e5\" already exists"}
2024-08-19T09:33:35Z    DEBUG   events  Gardener API create error: shoots.core.gardener.cloud "bd0e0e5" already exists: kcp-system/72a29ca1-65c7-4a5a-8191-a260adbc2de2     {"type": "Warning", "object": {"kind":"Runtime","namespace":"kcp-system","name":"72a29ca1-65c7-4a5a-8191-a260adbc2de2","uid":"7295b4b4-ab35-473c-8ae2-0c20d2e932a6","apiVersion":"infrastructuremanager.kyma-project.io/v1","resourceVersion":"4052686206"}, "reason": "GardenerErr"}
2024-08-19T09:34:25Z    ERROR   reqID 53239     Failed to create new gardener Shoot     {"error": "shoots.core.gardener.cloud \"bd0e0e5\" already exists"}
2024-08-19T09:34:25Z    DEBUG   events  Gardener API create error: shoots.core.gardener.cloud "bd0e0e5" already exists: kcp-system/72a29ca1-65c7-4a5a-8191-a260adbc2de2     {"type": "Warning", "object": {"kind":"Runtime","namespace":"kcp-system","name":"72a29ca1-65c7-4a5a-8191-a260adbc2de2","uid":"7295b4b4-ab35-473c-8ae2-0c20d2e932a6","apiVersion":"infrastructuremanager.kyma-project.io/v1","resourceVersion":"4052687325"}, "reason": "GardenerErr"}
2024-08-19T09:34:52Z    ERROR   reqID 53243     Failed to create new gardener Shoot     {"error": "shoots.core.gardener.cloud \"bd0e0e5\" already exists"}
2024-08-19T09:35:13Z    ERROR   reqID 53247     Failed to create new gardener Shoot     {"error": "shoots.core.gardener.cloud \"bd0e0e5\" already exists"}
2024-08-19T09:35:34Z    ERROR   reqID 53251     Failed to create new gardener Shoot     {"error": "shoots.core.gardener.cloud \"bd0e0e5\" already exists"}
2024-08-19T09:35:57Z    ERROR   reqID 53255     Failed to create new gardener Shoot     {"error": "shoots.core.gardener.cloud \"bd0e0e5\" already exists"}
2024-08-19T09:36:23Z    ERROR   reqID 53259     Failed to create new gardener Shoot     {"error": "shoots.core.gardener.cloud \"bd0e0e5\" already exists"}
2024-08-19T09:36:49Z    ERROR   reqID 53263     Failed to create new gardener Shoot     {"error": "shoots.core.gardener.cloud \"bd0e0e5\" already exists"}
2024-08-19T09:37:10Z    ERROR   reqID 53267     Failed to create new gardener Shoot     {"error": "shoots.core.gardener.cloud \"bd0e0e5\" already exists"}
2024-08-19T09:37:29Z    ERROR   reqID 53271     Failed to create new gardener Shoot     {"error": "shoots.core.gardener.cloud \"bd0e0e5\" already exists"}
2024-08-19T09:37:53Z    ERROR   reqID 53275     Failed to create new gardener Shoot     {"error": "shoots.core.gardener.cloud \"bd0e0e5\" already exists"}
2024-08-19T09:38:13Z    ERROR   reqID 53279     Failed to create new gardener Shoot     {"error": "shoots.core.gardener.cloud \"bd0e0e5\" already exists"}
2024-08-19T09:38:34Z    ERROR   reqID 53283     Failed to create new gardener Shoot     {"error": "shoots.core.gardener.cloud \"bd0e0e5\" already exists"}
2024-08-19T09:38:52Z    ERROR   reqID 53287     Failed to create new gardener Shoot     {"error": "shoots.core.gardener.cloud \"bd0e0e5\" already exists"}
2024-08-19T09:39:09Z    ERROR   reqID 53291     Failed to create new gardener Shoot     {"error": "shoots.core.gardener.cloud \"bd0e0e5\" already exists"}
2024-08-19T09:39:28Z    ERROR   reqID 53295     Failed to create new gardener Shoot     {"error": "shoots.core.gardener.cloud \"bd0e0e5\" already exists"}
2024-08-19T09:39:46Z    ERROR   reqID 53299     Failed to create new gardener Shoot     {"error": "shoots.core.gardener.cloud \"bd0e0e5\" already exists"}
2024-08-19T09:40:08Z    ERROR   reqID 53303     Failed to create new gardener Shoot     {"error": "shoots.core.gardener.cloud \"bd0e0e5\" already exists"}
2024-08-19T09:40:27Z    ERROR   reqID 53307     Failed to create new gardener Shoot     {"error": "shoots.core.gardener.cloud \"bd0e0e5\" already exists"}
2024-08-19T09:40:44Z    ERROR   reqID 53311     Failed to create new gardener Shoot     {"error": "shoots.core.gardener.cloud \"bd0e0e5\" already exists"}
2024-08-19T09:41:02Z    ERROR   reqID 53315     Failed to create new gardener Shoot     {"error": "shoots.core.gardener.cloud \"bd0e0e5\" already exists"}
2024-08-19T09:41:19Z    ERROR   reqID 53319     Failed to create new gardener Shoot     {"error": "shoots.core.gardener.cloud \"bd0e0e5\" already exists"}
2024-08-19T09:41:38Z    ERROR   reqID 53323     Failed to create new gardener Shoot     {"error": "shoots.core.gardener.cloud \"bd0e0e5\" already exists"}
2024-08-19T09:41:56Z    ERROR   reqID 53327     Failed to create new gardener Shoot     {"error": "shoots.core.gardener.cloud \"bd0e0e5\" already exists"}
  1. It looks like the cluster fails to be created because of the invalid networking type.
  2. I suspect that this might be some temporarily hiccup, as finally the cluster is created and in a healthy state https://dashboard.garden.canary.k8s.ondemand.com/namespace/garden-kyma-dev/shoots/bd0e0e5
  3. I have not yet analysed the state machine but it could be that KIM is reconciling an initial error and trying again to create the cluster... which is actually created. That's result in valid response from Gardener and invalid KIM behavior of setting the state to error.
jaroslaw-pieszka commented 3 months ago

After manually changing shoot name again, for a few secs we had state Pending then we got:

  conditions:
  - lastTransitionTime: "2024-08-19T10:38:33Z"
    message: 'Gardener API create error: shoots.core.gardener.cloud "badcafe00" already
      exists'
    reason: GardenerErr
    status: "False"
    type: Provisioned
akgalwas commented 3 months ago

There is definitely a regression. To sum it up, the already exists error occurs in the following cases:

  1. New runtime CR is created. The spec.shoot.name points to a shoot that already exists on Gardener
  2. spec.shoot.name is changed on existent Runtime CR

The problem is a blocker for migration as KIM will not be able to takeover existent runtime created by the Provisioner.

Disper commented 2 months ago

@piotrmiskiewicz , @jaroslaw-pieszka have you encountered this issue recently?

Disper commented 2 months ago

I've heard from @piotrmiskiewicz that recent runtime creations were succesful so I'm closing the ticket and please re-open it if you encounter the issue again.