kubeflow / kfctl

kfctl is a CLI for deploying and managing Kubeflow
Apache License 2.0
181 stars 137 forks source link

Optimize Kfctl manifest deletion order #385

Closed Jeffwan closed 4 years ago

Jeffwan commented 4 years ago

Issue is from https://github.com/kubeflow/manifests/issues/1421.

The issue is when we kfctl delete -f kfdef.yaml. namespace deletion will time out. This is because when we delete namespace, all resources will be deleted without control. If k8s APIServer will call registered apiservice to clean up resources, However, if apiservice backend pod has been deleted, then it will stuck there and being terminating.

v1beta1.webhook.cert-manager.io        cert-manager/cert-manager-webhook   False (ServiceNotFound)   3h17m

We delete kustomize files in reverse order, this is good and we can have some control on application order. For example, install istio, cert-manager first and then install kubeflow, etc. https://github.com/kubeflow/kfctl/blob/e7f548d4cee2ba4a7865f6f5ca3fa5ef1ca730ef/pkg/kfapp/kustomize/kustomize.go#L397

However, in each application, there's no order. If namespace becomes the first one to be deleted. it will stuck there until 5 min timeout.

Examples:

INFO[0202] Deleting application knative                  filename="kustomize/kustomize.go:401"
INFO[0202] Deleting Kind 'Namespace' in APIVersion 'v1' with name 'knative-serving' in namespace ''  filename="utils/k8utils.go:540"
WARN[0513] error evaluating kustomization manifest for knative: Timed out waiting for resource /knative-serving to be deleted. Error deleted resource is not cleaned up yet  filename="kustomize/kustomize.go:430"

...

INFO[0545] Deleting application cert-manager             filename="kustomize/kustomize.go:401"
INFO[0545] Deleting Kind 'Namespace' in APIVersion 'v1' with name 'cert-manager' in namespace ''  filename="utils/k8utils.go:540
WARN[0857] error evaluating kustomization manifest for cert-manager: Timed out waiting for resource /cert-manager to be deleted. Error deleted resource is not cleaned up yet  filename="kustomize/kustomize.go:430"

We can see knative-serving get stuck for few minutes and cert-manager again failed, eventually, when we delete base/namespace.

Error: couldn't delete KfApp:  (kubeflow.error): Code 500 with message: kfApp Delete failed for kustomize:  (kubeflow.error): Code 500 with message: error deleting kustomize manifests: [error evaluating kustomization manifest for knative: Timed out waiting for resource /knative-serving to be deleted. Error deleted resource is not cleaned up yet, error evaluating kustomization manifest for cert-manager: Timed out waiting for resource /cert-manager to be deleted. Error deleted resource is not cleaned up yet]
Usage:
  kfctl delete [flags]

Flags:
      --delete_storage   Set if you want to delete app's storage cluster used for mlpipeline.
  -f, --file string      The local config file of KfDef.
      --force-deletion   force-deletion output default is false
  -h, --help             help for delete
  -V, --verbose          verbose output default is false

kfctl exited with error: couldn't delete KfApp:  (kubeflow.error): Code 500 with message: kfApp Delete failed for kustomize:  (kubeflow.error): Code 500 with message: error deleting kustomize manifests: [error evaluating kustomization manifest for knative: Timed out waiting for resource /knative-serving to be deleted. Error deleted resource is not cleaned up yet, error evaluating kustomization manifest for cert-manager: Timed out waiting for resource /cert-manager to be delet

The proposal is to bring some orders based on Kind before we DeleteResource in one application.

https://github.com/kubeflow/kfctl/blob/e7f548d4cee2ba4a7865f6f5ca3fa5ef1ca730ef/pkg/kfapp/kustomize/kustomize.go#L426

The easiest way is to delete name at the end. More elegant way would be order by Kind.

issue-label-bot[bot] commented 4 years ago

Issue-Label Bot is automatically applying the labels:

Label Probability
area/kfctl 0.98
kind/feature 0.73

Please mark this comment with :thumbsup: or :thumbsdown: to give our bot feedback! Links: app homepage, dashboard and code for this bot.