open-horizon / anax

Horizon agent control system
https://open-horizon.github.io/docs/anax/docs/
Apache License 2.0
73 stars 98 forks source link

Bug: Installing a Kube Deployment with ImagePullSecrets fails multiple times before succeeding due to order YAMLs are applied #4081

Closed scwhaley closed 3 months ago

scwhaley commented 3 months ago

Describe the bug.

A kube deployment configuration has OperatorYamlArchive, which contains all of the kube resource yaml files to install. The order they are applied by the kube_worker during Install() is determined based on the type of Kube resource.

Currently the code does not specifically handle Secret resources (such as ImagePullSecrets), installing them last since it is considered a K8S_UNSTRUCTURED_TYPE.

A Deployment resource in the OperatorYamlArchive may be installed before the ImagePullSecret, leading to the deployment to fail because it cannot pull the image until too late. Thus the agreement is terminated and needs to be retried multiple times for the race condition of deployment timeout and ImagePullSecret existence to succeed.

Describe the steps to reproduce the behavior.

Create a cluster OH service with a Deployment YAML that depends on an ImagePullSecret YAML to pull it's image. Trigger the install to the cluster. Watch the install fail multiple times before eventually succeeding due to race condition. See in the deployment kube events that image pull fails multiple times per install attempt before the image pull secret is found near the end of the attempt.

Expected behavior.

Any Secret resource should be installed before any other kube resource that may depend on those Secret (such as deployments)

Screenshots.

No response

Operating Environment

Ubuntu 20.04, local k3d cluster deployment

Additional Information

No response