kubeflow / testing

Test infrastructure and tooling for Kubeflow.
Apache License 2.0
63 stars 89 forks source link

Blueprint autodeployments are failing #668

Closed jlewi closed 4 years ago

jlewi commented 4 years ago

Autodeployments of blueprints are failing.

Looking at the tekton dashboard. https://kf-ci-v1.endpoints.kubeflow-ci.cloud.goog/tekton/#/namespaces/auto-deploy/pipelineruns?labelSelector=tekton.dev%2Fpipeline%3Ddeploy-gcp-blueprint

error is

INFO|2020-05-15T17:30:12|/workspace/testing-repo/py/kubeflow/testing/util.py|72| kpt pkg get https://github.com/kubeflow/manifests.git@master ./upstream/manifests
INFO|2020-05-15T17:30:13|/workspace/testing-repo/py/kubeflow/testing/util.py|72| fetching package / from https://github.com/kubeflow/manifests to upstream/manifests
INFO|2020-05-15T17:30:20|/workspace/testing-repo/py/kubeflow/testing/util.py|72| Error: upstream/manifests/aws/aws-istio-authz-adaptor/overlays/application/application.yaml: yaml: line 25: did not find expected '-' indicator
INFO|2020-05-15T17:30:20|/workspace/testing-repo/py/kubeflow/testing/util.py|72| Makefile:36: recipe for target 'get-pkg' failed
INFO|2020-05-15T17:30:20|/workspace/testing-repo/py/kubeflow/testing/util.py|72| make: *** [get-pkg] Error 1
issue-label-bot[bot] commented 4 years ago

Issue-Label Bot is automatically applying the labels:

Label Probability
kind/bug 0.95
area/engprod 0.57

Please mark this comment with :thumbsup: or :thumbsdown: to give our bot feedback! Links: app homepage, dashboard and code for this bot.

jlewi commented 4 years ago

It looks like the problem is the application resource. https://github.com/kubeflow/manifests/blob/d82342c88cb943635d6842db3153f9909181a067/aws/nvidia-device-plugin/overlays/application/application.yaml#L23

It looks like the YAML spec is invalid. The indentation of OWNERs is wrong.

It looks like the validation I'm writing to fix kubeflow/manifests#1174 is catching this.

@Jeffwan I will fix the YAML as part of kubeflow/manifests#1174 so no need to worry about this unless you need a fix sooner. Should have a PR this weekend or Monday.

Jeffwan commented 4 years ago

Thanks @jlewi for the fix. I think I carelessly brings some indent issue in https://github.com/kubeflow/manifests/pull/1162

jlewi commented 4 years ago

@Jeffwan not a problem

jlewi commented 4 years ago

Auto deployed blueprints are still failing with an rbac issue.

Error from server (Forbidden): error when creating ".build/gcp_config/iam.cnrm.cloud.google.com_v1beta1_iamserviceaccount_kf-vbp-0520-61c-vm.yaml": iamserviceaccounts.iam.cnrm.cloud.google.com is forbidden: User "kf-ci-v1-user@kubeflow-ci.iam.gserviceaccount.com" cannot create resource "iamserviceaccounts" in API group "iam.cnrm.cloud.google.com" in the namespace "kubeflow-ci-deployment": requires one of ["container.thirdPartyObjects.create"] permission(s)
issue-label-bot[bot] commented 4 years ago

Issue-Label Bot is automatically applying the labels:

Label Probability
platform/gcp 0.68

Please mark this comment with :thumbsup: or :thumbsdown: to give our bot feedback! Links: app homepage, dashboard and code for this bot.

jlewi commented 4 years ago

kubeflow-ci-deployment namespace is missing

kubectl --context=kf-ci-deployment-management create namespace kubeflow-ci-deployment
kubectl --context=kf-ci-deployment-management -n kubeflow-ci-deployment create rolebinding kf-ci-v1-cnrm-admin --user=kf-ci-v1-user@kubeflow-ci.iam.gserviceaccount.com --clusterrole=cnrm-admin
jlewi commented 4 years ago

Fixed https://kf-vbp-0520-ba7.endpoints.kubeflow-ci-deployment.cloud.goog/?ns=kubeflow-jlewi