kontena / pharos-cluster

Pharos - The Kubernetes Distribution
https://k8spharos.dev/
Apache License 2.0
311 stars 43 forks source link

Possible race condition on CRD POST => APIResources GET #529

Closed SpComb closed 5 years ago

SpComb commented 6 years ago

The broken error handling is https://github.com/kontena/k8s-client/issues/16, but the underlying problem would be GET /apis/certmanager.k8s.io/v1alpha1 omitting the Issuer kind from the APIResourceList immediately after the POST /apis/apiextensions.k8s.io/v1beta1/customresourcedefinitions for it. This is not pipelined or anything, but the requests happen within <100ms of eachother.

This is with a single-master setup.

==> Enabling addon cert-manager
I, [2018-08-14T11:31:36.323237 #1]  INFO -- K8s::Transport<https://master.terom-pharos-dev.kontena.works:6443>: [GET /apis/certmanager.k8s.io/v1alpha1] => HTTP [404] in 0.050s
I, [2018-08-14T11:31:36.486642 #1]  INFO -- K8s::Transport<https://master.terom-pharos-dev.kontena.works:6443>: [GET /apis/apiextensions.k8s.io/v1beta1/customresourcedefinitions/certificates.certmanager.k8s.io, GET /apis/apiextensions.k8s.io/v1beta1/customresourcedefinitions/clusterissuers.certmanager.k8s.io, GET /apis/rbac.authorization.k8s.io/v1beta1/clusterrolebindings/cert-manager, GET /apis/rbac.authorization.k8s.io/v1beta1/clusterroles/cert-manager, GET /apis/extensions/v1beta1/namespaces/kube-system/deployments/cert-manager, GET /apis/apiextensions.k8s.io/v1beta1/customresourcedefinitions/issuers.certmanager.k8s.io, GET /api/v1/namespaces/kube-system/serviceaccounts/cert-manager] => HTTP [404, 404, 404, 404, 404, 404, 404] in 0.161s
I, [2018-08-14T11:31:36.486867 #1]  INFO -- Pharos::Kube::Stack<cert-manager>: Create resource apiextensions.k8s.io/v1beta1:CustomResourceDefinition/certificates.certmanager.k8s.io in namespace  with checksum=c9702757db022541ecaf0c0993f06a4c
I, [2018-08-14T11:31:36.561896 #1]  INFO -- K8s::Transport<https://master.terom-pharos-dev.kontena.works:6443>: POST /apis/apiextensions.k8s.io/v1beta1/customresourcedefinitions <K8s::Resource> => HTTP 201: <K8s::Resource> in 0.074s
I, [2018-08-14T11:31:36.562118 #1]  INFO -- Pharos::Kube::Stack<cert-manager>: Create resource apiextensions.k8s.io/v1beta1:CustomResourceDefinition/clusterissuers.certmanager.k8s.io in namespace  with checksum=c9702757db022541ecaf0c0993f06a4c
I, [2018-08-14T11:31:36.634936 #1]  INFO -- K8s::Transport<https://master.terom-pharos-dev.kontena.works:6443>: POST /apis/apiextensions.k8s.io/v1beta1/customresourcedefinitions <K8s::Resource> => HTTP 201: <K8s::Resource> in 0.072s
I, [2018-08-14T11:31:36.635937 #1]  INFO -- Pharos::Kube::Stack<cert-manager>: Create resource rbac.authorization.k8s.io/v1beta1:ClusterRoleBinding/cert-manager in namespace  with checksum=c9702757db022541ecaf0c0993f06a4c
I, [2018-08-14T11:31:36.708062 #1]  INFO -- K8s::Transport<https://master.terom-pharos-dev.kontena.works:6443>: POST /apis/rbac.authorization.k8s.io/v1beta1/clusterrolebindings <K8s::Resource> => HTTP 201: <K8s::Resource> in 0.070s
I, [2018-08-14T11:31:36.708302 #1]  INFO -- Pharos::Kube::Stack<cert-manager>: Create resource rbac.authorization.k8s.io/v1beta1:ClusterRole/cert-manager in namespace  with checksum=c9702757db022541ecaf0c0993f06a4c
I, [2018-08-14T11:31:36.767376 #1]  INFO -- K8s::Transport<https://master.terom-pharos-dev.kontena.works:6443>: POST /apis/rbac.authorization.k8s.io/v1beta1/clusterroles <K8s::Resource> => HTTP 201: <K8s::Resource> in 0.058s
I, [2018-08-14T11:31:36.768148 #1]  INFO -- Pharos::Kube::Stack<cert-manager>: Create resource extensions/v1beta1:Deployment/cert-manager in namespace kube-system with checksum=c9702757db022541ecaf0c0993f06a4c
I, [2018-08-14T11:31:36.835164 #1]  INFO -- K8s::Transport<https://master.terom-pharos-dev.kontena.works:6443>: POST /apis/extensions/v1beta1/namespaces/kube-system/deployments <K8s::Resource> => HTTP 201: <K8s::Resource> in 0.066s
I, [2018-08-14T11:31:36.836051 #1]  INFO -- Pharos::Kube::Stack<cert-manager>: Create resource apiextensions.k8s.io/v1beta1:CustomResourceDefinition/issuers.certmanager.k8s.io in namespace  with checksum=c9702757db022541ecaf0c0993f06a4c
I, [2018-08-14T11:31:36.957826 #1]  INFO -- K8s::Transport<https://master.terom-pharos-dev.kontena.works:6443>: POST /apis/apiextensions.k8s.io/v1beta1/customresourcedefinitions <K8s::Resource> => HTTP 201: <K8s::Resource> in 0.117s
I, [2018-08-14T11:31:36.959008 #1]  INFO -- Pharos::Kube::Stack<cert-manager>: Create resource certmanager.k8s.io/v1alpha1:Issuer/letsencrypt in namespace default with checksum=c9702757db022541ecaf0c0993f06a4c
I, [2018-08-14T11:31:37.013708 #1]  INFO -- K8s::Transport<https://master.terom-pharos-dev.kontena.works:6443>: GET /apis/certmanager.k8s.io/v1alpha1 => HTTP 200: <K8s::API::MetaV1::APIResourceList> in 0.052s
undefined method `kind' for nil:NilClass
/usr/local/bundle/gems/k8s-client-0.3.1/lib/k8s/api_client.rb:78:in `client_for_resource'
/usr/local/bundle/gems/k8s-client-0.3.1/lib/k8s/client.rb:107:in `client_for_resource'
/usr/local/bundle/gems/k8s-client-0.3.1/lib/k8s/client.rb:113:in `create_resource'
/usr/local/bundle/gems/k8s-client-0.3.1/lib/k8s/stack.rb:86:in `block in apply'
/usr/local/bundle/gems/k8s-client-0.3.1/lib/k8s/stack.rb:74:in `map'
/usr/local/bundle/gems/k8s-client-0.3.1/lib/k8s/stack.rb:74:in `apply'
/app/lib/pharos/addon.rb:205:in `apply_resources'
/app/lib/pharos/addon.rb:171:in `apply_install'
/app/lib/pharos/addon.rb:157:in `apply'
/app/lib/pharos/cluster_manager.rb:126:in `block in apply_addons'
/app/lib/pharos/addon_manager.rb:74:in `block in each'
/app/lib/pharos/addon_manager.rb:86:in `block in with_enabled_addons'
/app/lib/pharos/addon_manager.rb:83:in `each'
/app/lib/pharos/addon_manager.rb:83:in `with_enabled_addons'
/app/lib/pharos/addon_manager.rb:70:in `each'
/app/lib/pharos/cluster_manager.rb:123:in `apply_addons'
/app/lib/pharos/up_command.rb:107:in `configure'
/app/lib/pharos/up_command.rb:44:in `block in execute'
/app/lib/pharos/up_command.rb:43:in `chdir'
/app/lib/pharos/up_command.rb:43:in `execute'
/usr/local/bundle/gems/clamp-1.2.1/lib/clamp/command.rb:63:in `run'
/usr/local/bundle/gems/clamp-1.2.1/lib/clamp/subcommand/execution.rb:11:in `execute'
/usr/local/bundle/gems/clamp-1.2.1/lib/clamp/command.rb:63:in `run'
/usr/local/bundle/gems/clamp-1.2.1/lib/clamp/command.rb:132:in `run'
/app/lib/pharos/root_command.rb:16:in `run'
bin/pharos-cluster:12:in `<main>'
SpComb commented 6 years ago

Here's actual evidence of the race condition, the GET /apis/certmanager.k8s.io/v1alpha1 is missing the Issuer CRD that was just POST'd:

==> Enabling addon cert-manager
I, [2018-08-16T12:13:01.836638 #1]  INFO -- K8s::Transport<https://master.terom-pharos-dev.kontena.works:6443>: [GET /apis/certmanager.k8s.io/v1alpha1] => HTTP [404] in 0.042s
I, [2018-08-16T12:13:02.145918 #1]  INFO -- K8s::Transport<https://master.terom-pharos-dev.kontena.works:6443>: [GET /apis/apiextensions.k8s.io/v1beta1/customresourcedefinitions/certificates.certmanager.k8s.io, GET /apis/apiextensions.k8s.io/v1beta1/customresourcedefinitions/clusterissuers.certmanager.k8s.io, GET /apis/rbac.authorization.k8s.io/v1beta1/clusterrolebindings/cert-manager, GET /apis/rbac.authorization.k8s.io/v1beta1/clusterroles/cert-manager, GET /apis/extensions/v1beta1/namespaces/kube-system/deployments/cert-manager, GET /apis/apiextensions.k8s.io/v1beta1/customresourcedefinitions/issuers.certmanager.k8s.io, GET /api/v1/namespaces/kube-system/serviceaccounts/cert-manager] => HTTP [404, 404, 404, 404, 404, 404, 404] in 0.301s
I, [2018-08-16T12:13:02.146619 #1]  INFO -- Pharos::Kube::Stack<cert-manager>: Create resource apiextensions.k8s.io/v1beta1:CustomResourceDefinition/certificates.certmanager.k8s.io in namespace  with checksum=d1dcbc0bb233f7ca8b235bdc88c693ce
I, [2018-08-16T12:13:02.234613 #1]  INFO -- K8s::Transport<https://master.terom-pharos-dev.kontena.works:6443>: POST /apis/apiextensions.k8s.io/v1beta1/customresourcedefinitions <K8s::Resource> => HTTP 201: <K8s::Resource> in 0.085s
D, [2018-08-16T12:13:02.234877 #1] DEBUG -- K8s::Transport<https://master.terom-pharos-dev.kontena.works:6443>: Request: {"apiVersion":"apiextensions.k8s.io/v1beta1","kind":"CustomResourceDefinition","metadata":{"name":"certificates.certmanager.k8s.io","labels":{"pharos.kontena.io/stack":"cert-manager"},"annotations":{"pharos.kontena.io/stack-checksum":"d1dcbc0bb233f7ca8b235bdc88c693ce"}},"spec":{"group":"certmanager.k8s.io","version":"v1alpha1","names":{"kind":"Certificate","plural":"certificates"},"scope":"Namespaced"}}
D, [2018-08-16T12:13:02.235004 #1] DEBUG -- K8s::Transport<https://master.terom-pharos-dev.kontena.works:6443>: Response: {"kind":"CustomResourceDefinition","apiVersion":"apiextensions.k8s.io/v1beta1","metadata":{"name":"certificates.certmanager.k8s.io","selfLink":"/apis/apiextensions.k8s.io/v1beta1/customresourcedefinitions/certificates.certmanager.k8s.io","uid":"b8e9facc-a14d-11e8-b195-76d458035df6","resourceVersion":"913","generation":1,"creationTimestamp":"2018-08-16T12:13:02Z","labels":{"pharos.kontena.io/stack":"cert-manager"},"annotations":{"pharos.kontena.io/stack-checksum":"d1dcbc0bb233f7ca8b235bdc88c693ce"}},"spec":{"group":"certmanager.k8s.io","version":"v1alpha1","names":{"plural":"certificates","singular":"certificate","kind":"Certificate","listKind":"CertificateList"},"scope":"Namespaced","versions":[{"name":"v1alpha1","served":true,"storage":true}],"additionalPrinterColumns":[{"name":"Age","type":"date","description":"CreationTimestamp is a timestamp representing the server time when this object was created. It is not guaranteed to be set in happens-before order across separate operations. Clients may not set this value. It is represented in RFC3339 form and is in UTC.\n\nPopulated by the system. Read-only. Null for lists. More info: https://git.k8s.io/community/contributors/devel/api-conventions.md#metadata","JSONPath":".metadata.creationTimestamp"}]},"status":{"conditions":null,"acceptedNames":{"plural":"","kind":""},"storedVersions":["v1alpha1"]}}

I, [2018-08-16T12:13:02.235993 #1]  INFO -- Pharos::Kube::Stack<cert-manager>: Create resource apiextensions.k8s.io/v1beta1:CustomResourceDefinition/clusterissuers.certmanager.k8s.io in namespace  with checksum=d1dcbc0bb233f7ca8b235bdc88c693ce
I, [2018-08-16T12:13:02.340007 #1]  INFO -- K8s::Transport<https://master.terom-pharos-dev.kontena.works:6443>: POST /apis/apiextensions.k8s.io/v1beta1/customresourcedefinitions <K8s::Resource> => HTTP 201: <K8s::Resource> in 0.101s
D, [2018-08-16T12:13:02.340259 #1] DEBUG -- K8s::Transport<https://master.terom-pharos-dev.kontena.works:6443>: Request: {"apiVersion":"apiextensions.k8s.io/v1beta1","kind":"CustomResourceDefinition","metadata":{"name":"clusterissuers.certmanager.k8s.io","labels":{"pharos.kontena.io/stack":"cert-manager"},"annotations":{"pharos.kontena.io/stack-checksum":"d1dcbc0bb233f7ca8b235bdc88c693ce"}},"spec":{"group":"certmanager.k8s.io","version":"v1alpha1","names":{"kind":"ClusterIssuer","plural":"clusterissuers"},"scope":"Cluster"}}
D, [2018-08-16T12:13:02.340615 #1] DEBUG -- K8s::Transport<https://master.terom-pharos-dev.kontena.works:6443>: Response: {"kind":"CustomResourceDefinition","apiVersion":"apiextensions.k8s.io/v1beta1","metadata":{"name":"clusterissuers.certmanager.k8s.io","selfLink":"/apis/apiextensions.k8s.io/v1beta1/customresourcedefinitions/clusterissuers.certmanager.k8s.io","uid":"b8f7965b-a14d-11e8-b195-76d458035df6","resourceVersion":"919","generation":1,"creationTimestamp":"2018-08-16T12:13:02Z","labels":{"pharos.kontena.io/stack":"cert-manager"},"annotations":{"pharos.kontena.io/stack-checksum":"d1dcbc0bb233f7ca8b235bdc88c693ce"}},"spec":{"group":"certmanager.k8s.io","version":"v1alpha1","names":{"plural":"clusterissuers","singular":"clusterissuer","kind":"ClusterIssuer","listKind":"ClusterIssuerList"},"scope":"Cluster","versions":[{"name":"v1alpha1","served":true,"storage":true}],"additionalPrinterColumns":[{"name":"Age","type":"date","description":"CreationTimestamp is a timestamp representing the server time when this object was created. It is not guaranteed to be set in happens-before order across separate operations. Clients may not set this value. It is represented in RFC3339 form and is in UTC.\n\nPopulated by the system. Read-only. Null for lists. More info: https://git.k8s.io/community/contributors/devel/api-conventions.md#metadata","JSONPath":".metadata.creationTimestamp"}]},"status":{"conditions":null,"acceptedNames":{"plural":"","kind":""},"storedVersions":["v1alpha1"]}}

I, [2018-08-16T12:13:02.341594 #1]  INFO -- Pharos::Kube::Stack<cert-manager>: Create resource rbac.authorization.k8s.io/v1beta1:ClusterRoleBinding/cert-manager in namespace  with checksum=d1dcbc0bb233f7ca8b235bdc88c693ce
I, [2018-08-16T12:13:02.448264 #1]  INFO -- K8s::Transport<https://master.terom-pharos-dev.kontena.works:6443>: POST /apis/rbac.authorization.k8s.io/v1beta1/clusterrolebindings <K8s::Resource> => HTTP 201: <K8s::Resource> in 0.104s
D, [2018-08-16T12:13:02.448618 #1] DEBUG -- K8s::Transport<https://master.terom-pharos-dev.kontena.works:6443>: Request: {"apiVersion":"rbac.authorization.k8s.io/v1beta1","kind":"ClusterRoleBinding","metadata":{"name":"cert-manager","labels":{"pharos.kontena.io/stack":"cert-manager"},"annotations":{"pharos.kontena.io/stack-checksum":"d1dcbc0bb233f7ca8b235bdc88c693ce"}},"roleRef":{"apiGroup":"rbac.authorization.k8s.io","kind":"ClusterRole","name":"cert-manager"},"subjects":[{"name":"cert-manager","namespace":"kube-system","kind":"ServiceAccount"}]}
D, [2018-08-16T12:13:02.448781 #1] DEBUG -- K8s::Transport<https://master.terom-pharos-dev.kontena.works:6443>: Response: {"kind":"ClusterRoleBinding","apiVersion":"rbac.authorization.k8s.io/v1beta1","metadata":{"name":"cert-manager","selfLink":"/apis/rbac.authorization.k8s.io/v1beta1/clusterrolebindings/cert-manager","uid":"b90a2c11-a14d-11e8-b195-76d458035df6","resourceVersion":"924","creationTimestamp":"2018-08-16T12:13:02Z","labels":{"pharos.kontena.io/stack":"cert-manager"},"annotations":{"pharos.kontena.io/stack-checksum":"d1dcbc0bb233f7ca8b235bdc88c693ce"}},"subjects":[{"kind":"ServiceAccount","name":"cert-manager","namespace":"kube-system"}],"roleRef":{"apiGroup":"rbac.authorization.k8s.io","kind":"ClusterRole","name":"cert-manager"}}

I, [2018-08-16T12:13:02.449702 #1]  INFO -- Pharos::Kube::Stack<cert-manager>: Create resource rbac.authorization.k8s.io/v1beta1:ClusterRole/cert-manager in namespace  with checksum=d1dcbc0bb233f7ca8b235bdc88c693ce
I, [2018-08-16T12:13:02.522768 #1]  INFO -- K8s::Transport<https://master.terom-pharos-dev.kontena.works:6443>: POST /apis/rbac.authorization.k8s.io/v1beta1/clusterroles <K8s::Resource> => HTTP 201: <K8s::Resource> in 0.070s
D, [2018-08-16T12:13:02.522988 #1] DEBUG -- K8s::Transport<https://master.terom-pharos-dev.kontena.works:6443>: Request: {"apiVersion":"rbac.authorization.k8s.io/v1beta1","kind":"ClusterRole","metadata":{"name":"cert-manager","labels":{"pharos.kontena.io/stack":"cert-manager"},"annotations":{"pharos.kontena.io/stack-checksum":"d1dcbc0bb233f7ca8b235bdc88c693ce"}},"rules":[{"apiGroups":["certmanager.k8s.io"],"resources":["certificates","issuers","clusterissuers"],"verbs":["*"]},{"apiGroups":[""],"resources":["secrets","events","endpoints","services","pods"],"verbs":["*"]},{"apiGroups":["extensions"],"resources":["ingresses"],"verbs":["*"]}]}
D, [2018-08-16T12:13:02.523107 #1] DEBUG -- K8s::Transport<https://master.terom-pharos-dev.kontena.works:6443>: Response: {"kind":"ClusterRole","apiVersion":"rbac.authorization.k8s.io/v1beta1","metadata":{"name":"cert-manager","selfLink":"/apis/rbac.authorization.k8s.io/v1beta1/clusterroles/cert-manager","uid":"b917fd61-a14d-11e8-b195-76d458035df6","resourceVersion":"926","creationTimestamp":"2018-08-16T12:13:02Z","labels":{"pharos.kontena.io/stack":"cert-manager"},"annotations":{"pharos.kontena.io/stack-checksum":"d1dcbc0bb233f7ca8b235bdc88c693ce"}},"rules":[{"verbs":["*"],"apiGroups":["certmanager.k8s.io"],"resources":["certificates","issuers","clusterissuers"]},{"verbs":["*"],"apiGroups":[""],"resources":["secrets","events","endpoints","services","pods"]},{"verbs":["*"],"apiGroups":["extensions"],"resources":["ingresses"]}]}

I, [2018-08-16T12:13:02.523790 #1]  INFO -- Pharos::Kube::Stack<cert-manager>: Create resource extensions/v1beta1:Deployment/cert-manager in namespace kube-system with checksum=d1dcbc0bb233f7ca8b235bdc88c693ce
I, [2018-08-16T12:13:02.930862 #1]  INFO -- K8s::Transport<https://master.terom-pharos-dev.kontena.works:6443>: POST /apis/extensions/v1beta1/namespaces/kube-system/deployments <K8s::Resource> => HTTP 201: <K8s::Resource> in 0.404s
D, [2018-08-16T12:13:02.931074 #1] DEBUG -- K8s::Transport<https://master.terom-pharos-dev.kontena.works:6443>: Request: {"apiVersion":"extensions/v1beta1","kind":"Deployment","metadata":{"name":"cert-manager","namespace":"kube-system","labels":{"pharos.kontena.io/stack":"cert-manager"},"annotations":{"pharos.kontena.io/stack-checksum":"d1dcbc0bb233f7ca8b235bdc88c693ce"}},"spec":{"replicas":1,"template":{"metadata":{"labels":{"app":"cert-manager"}},"spec":{"serviceAccountName":"cert-manager","containers":[{"name":"cert-manager","image":"quay.io/jetstack/cert-manager-controller:v0.2.3","imagePullPolicy":"IfNotPresent","resources":{"requests":{"cpu":"10m","memory":"32Mi"}}},{"name":"ingress-shim","image":"quay.io/jetstack/cert-manager-ingress-shim:v0.2.3","imagePullPolicy":"IfNotPresent","resources":{"requests":{"cpu":"10m","memory":"32Mi"}}}]}}}}
D, [2018-08-16T12:13:02.931216 #1] DEBUG -- K8s::Transport<https://master.terom-pharos-dev.kontena.works:6443>: Response: {"kind":"Deployment","apiVersion":"extensions/v1beta1","metadata":{"name":"cert-manager","namespace":"kube-system","selfLink":"/apis/extensions/v1beta1/namespaces/kube-system/deployments/cert-manager","uid":"b9245a0e-a14d-11e8-b195-76d458035df6","resourceVersion":"928","generation":1,"creationTimestamp":"2018-08-16T12:13:02Z","labels":{"pharos.kontena.io/stack":"cert-manager"},"annotations":{"pharos.kontena.io/stack-checksum":"d1dcbc0bb233f7ca8b235bdc88c693ce"}},"spec":{"replicas":1,"selector":{"matchLabels":{"app":"cert-manager"}},"template":{"metadata":{"creationTimestamp":null,"labels":{"app":"cert-manager"}},"spec":{"containers":[{"name":"cert-manager","image":"quay.io/jetstack/cert-manager-controller:v0.2.3","resources":{"requests":{"cpu":"10m","memory":"32Mi"}},"terminationMessagePath":"/dev/termination-log","terminationMessagePolicy":"File","imagePullPolicy":"IfNotPresent"},{"name":"ingress-shim","image":"quay.io/jetstack/cert-manager-ingress-shim:v0.2.3","resources":{"requests":{"cpu":"10m","memory":"32Mi"}},"terminationMessagePath":"/dev/termination-log","terminationMessagePolicy":"File","imagePullPolicy":"IfNotPresent"}],"restartPolicy":"Always","terminationGracePeriodSeconds":30,"dnsPolicy":"ClusterFirst","serviceAccountName":"cert-manager","serviceAccount":"cert-manager","securityContext":{},"schedulerName":"default-scheduler"}},"strategy":{"type":"RollingUpdate","rollingUpdate":{"maxUnavailable":1,"maxSurge":1}},"revisionHistoryLimit":10,"progressDeadlineSeconds":600},"status":{}}

I, [2018-08-16T12:13:02.931951 #1]  INFO -- Pharos::Kube::Stack<cert-manager>: Create resource apiextensions.k8s.io/v1beta1:CustomResourceDefinition/issuers.certmanager.k8s.io in namespace  with checksum=d1dcbc0bb233f7ca8b235bdc88c693ce
I, [2018-08-16T12:13:03.059715 #1]  INFO -- K8s::Transport<https://master.terom-pharos-dev.kontena.works:6443>: POST /apis/apiextensions.k8s.io/v1beta1/customresourcedefinitions <K8s::Resource> => HTTP 201: <K8s::Resource> in 0.125s
D, [2018-08-16T12:13:03.059932 #1] DEBUG -- K8s::Transport<https://master.terom-pharos-dev.kontena.works:6443>: Request: {"apiVersion":"apiextensions.k8s.io/v1beta1","kind":"CustomResourceDefinition","metadata":{"name":"issuers.certmanager.k8s.io","labels":{"pharos.kontena.io/stack":"cert-manager"},"annotations":{"pharos.kontena.io/stack-checksum":"d1dcbc0bb233f7ca8b235bdc88c693ce"}},"spec":{"group":"certmanager.k8s.io","version":"v1alpha1","names":{"kind":"Issuer","plural":"issuers"},"scope":"Namespaced"}}
D, [2018-08-16T12:13:03.060062 #1] DEBUG -- K8s::Transport<https://master.terom-pharos-dev.kontena.works:6443>: Response: {"kind":"CustomResourceDefinition","apiVersion":"apiextensions.k8s.io/v1beta1","metadata":{"name":"issuers.certmanager.k8s.io","selfLink":"/apis/apiextensions.k8s.io/v1beta1/customresourcedefinitions/issuers.certmanager.k8s.io","uid":"b961a71d-a14d-11e8-b195-76d458035df6","resourceVersion":"933","generation":1,"creationTimestamp":"2018-08-16T12:13:02Z","labels":{"pharos.kontena.io/stack":"cert-manager"},"annotations":{"pharos.kontena.io/stack-checksum":"d1dcbc0bb233f7ca8b235bdc88c693ce"}},"spec":{"group":"certmanager.k8s.io","version":"v1alpha1","names":{"plural":"issuers","singular":"issuer","kind":"Issuer","listKind":"IssuerList"},"scope":"Namespaced","versions":[{"name":"v1alpha1","served":true,"storage":true}],"additionalPrinterColumns":[{"name":"Age","type":"date","description":"CreationTimestamp is a timestamp representing the server time when this object was created. It is not guaranteed to be set in happens-before order across separate operations. Clients may not set this value. It is represented in RFC3339 form and is in UTC.\n\nPopulated by the system. Read-only. Null for lists. More info: https://git.k8s.io/community/contributors/devel/api-conventions.md#metadata","JSONPath":".metadata.creationTimestamp"}]},"status":{"conditions":null,"acceptedNames":{"plural":"","kind":""},"storedVersions":["v1alpha1"]}}

I, [2018-08-16T12:13:03.061395 #1]  INFO -- Pharos::Kube::Stack<cert-manager>: Create resource certmanager.k8s.io/v1alpha1:Issuer/letsencrypt in namespace default with checksum=d1dcbc0bb233f7ca8b235bdc88c693ce
I, [2018-08-16T12:13:03.107916 #1]  INFO -- K8s::Transport<https://master.terom-pharos-dev.kontena.works:6443>: GET /apis/certmanager.k8s.io/v1alpha1 => HTTP 200: <K8s::API::MetaV1::APIResourceList> in 0.044s
D, [2018-08-16T12:13:03.108128 #1] DEBUG -- K8s::Transport<https://master.terom-pharos-dev.kontena.works:6443>: Response: {"kind":"APIResourceList","apiVersion":"v1","groupVersion":"certmanager.k8s.io/v1alpha1","resources":[{"name":"clusterissuers","singularName":"clusterissuer","namespaced":false,"kind":"ClusterIssuer","verbs":["delete","deletecollection","get","list","patch","create","update","watch"]},{"name":"certificates","singularName":"certificate","namespaced":true,"kind":"Certificate","verbs":["delete","deletecollection","get","list","patch","create","update","watch"]}]}

undefined method `kind' for nil:NilClass
/usr/local/bundle/gems/k8s-client-0.3.1/lib/k8s/api_client.rb:78:in `client_for_resource'
/usr/local/bundle/gems/k8s-client-0.3.1/lib/k8s/client.rb:107:in `client_for_resource'
/usr/local/bundle/gems/k8s-client-0.3.1/lib/k8s/client.rb:113:in `create_resource'
/usr/local/bundle/gems/k8s-client-0.3.1/lib/k8s/stack.rb:86:in `block in apply'
/usr/local/bundle/gems/k8s-client-0.3.1/lib/k8s/stack.rb:74:in `map'
/usr/local/bundle/gems/k8s-client-0.3.1/lib/k8s/stack.rb:74:in `apply'
/app/lib/pharos/addon.rb:205:in `apply_resources'
/app/lib/pharos/addon.rb:171:in `apply_install'
/app/lib/pharos/addon.rb:157:in `apply'
/app/lib/pharos/cluster_manager.rb:127:in `block in apply_addons'
/app/lib/pharos/addon_manager.rb:74:in `block in each'
/app/lib/pharos/addon_manager.rb:86:in `block in with_enabled_addons'
/app/lib/pharos/addon_manager.rb:83:in `each'
/app/lib/pharos/addon_manager.rb:83:in `with_enabled_addons'
/app/lib/pharos/addon_manager.rb:70:in `each'
/app/lib/pharos/cluster_manager.rb:124:in `apply_addons'
/app/lib/pharos/up_command.rb:107:in `configure'
/app/lib/pharos/up_command.rb:44:in `block in execute'
/app/lib/pharos/up_command.rb:43:in `chdir'
/app/lib/pharos/up_command.rb:43:in `execute'
/usr/local/bundle/gems/clamp-1.2.1/lib/clamp/command.rb:63:in `run'
/usr/local/bundle/gems/clamp-1.2.1/lib/clamp/subcommand/execution.rb:11:in `execute'
/usr/local/bundle/gems/clamp-1.2.1/lib/clamp/command.rb:63:in `run'
/usr/local/bundle/gems/clamp-1.2.1/lib/clamp/command.rb:132:in `run'
/app/lib/pharos/root_command.rb:16:in `run'
bin/pharos-cluster:12:in `<main>'

Note that https://github.com/kontena/k8s-client/issues/16 is not yet fixed in pharos-cluster, it's pending another patch release. Even with that fix, the addons also need retry logic.

SpComb commented 6 years ago

This race condition seems to be getting more common for me, and it currently isn't getting retried on the K8s::Error::UnknownResource because it happens within an addon, not a phase:

==> Enabling addon cert-manager
I, [2018-08-22T08:02:10.874543 #1]  INFO -- K8s::Transport<https://master.terom-pharos-dev.kontena.works:6443>: [GET /apis/certmanager.k8s.io/v1alpha1] => HTTP [404] in 0.054s
I, [2018-08-22T08:02:11.003675 #1]  INFO -- K8s::Transport<https://master.terom-pharos-dev.kontena.works:6443>: [GET /apis/apiextensions.k8s.io/v1beta1/customresourcedefinitions/certificates.certmanager.k8s.io, GET /apis/apiextensions.k8s.io/v1beta1/customresourcedefinitions/clusterissuers.certmanager.k8s.io, GET /apis/rbac.authorization.k8s.io/v1beta1/clusterrolebindings/cert-manager, GET /apis/rbac.authorization.k8s.io/v1beta1/clusterroles/cert-manager, GET /apis/extensions/v1beta1/namespaces/kube-system/deployments/cert-manager, GET /apis/apiextensions.k8s.io/v1beta1/customresourcedefinitions/issuers.certmanager.k8s.io, GET /api/v1/namespaces/kube-system/serviceaccounts/cert-manager] => HTTP [404, 404, 404, 404, 404, 404, 404] in 0.114s
I, [2018-08-22T08:02:11.004334 #1]  INFO -- Pharos::Kube::Stack<cert-manager>: Create resource apiextensions.k8s.io/v1beta1:CustomResourceDefinition/certificates.certmanager.k8s.io in namespace  with checksum=81caae86c57ac192eaf80570354a4c2c
I, [2018-08-22T08:02:11.068150 #1]  INFO -- K8s::Transport<https://master.terom-pharos-dev.kontena.works:6443>: POST /apis/apiextensions.k8s.io/v1beta1/customresourcedefinitions <K8s::Resource> => HTTP 201: <K8s::Resource> in 0.061s
I, [2018-08-22T08:02:11.070109 #1]  INFO -- Pharos::Kube::Stack<cert-manager>: Create resource apiextensions.k8s.io/v1beta1:CustomResourceDefinition/clusterissuers.certmanager.k8s.io in namespace  with checksum=81caae86c57ac192eaf80570354a4c2c
I, [2018-08-22T08:02:11.149896 #1]  INFO -- K8s::Transport<https://master.terom-pharos-dev.kontena.works:6443>: POST /apis/apiextensions.k8s.io/v1beta1/customresourcedefinitions <K8s::Resource> => HTTP 201: <K8s::Resource> in 0.076s
I, [2018-08-22T08:02:11.150779 #1]  INFO -- Pharos::Kube::Stack<cert-manager>: Create resource rbac.authorization.k8s.io/v1beta1:ClusterRoleBinding/cert-manager in namespace  with checksum=81caae86c57ac192eaf80570354a4c2c
I, [2018-08-22T08:02:11.219933 #1]  INFO -- K8s::Transport<https://master.terom-pharos-dev.kontena.works:6443>: POST /apis/rbac.authorization.k8s.io/v1beta1/clusterrolebindings <K8s::Resource> => HTTP 201: <K8s::Resource> in 0.066s
I, [2018-08-22T08:02:11.223306 #1]  INFO -- Pharos::Kube::Stack<cert-manager>: Create resource rbac.authorization.k8s.io/v1beta1:ClusterRole/cert-manager in namespace  with checksum=81caae86c57ac192eaf80570354a4c2c
I, [2018-08-22T08:02:11.293033 #1]  INFO -- K8s::Transport<https://master.terom-pharos-dev.kontena.works:6443>: POST /apis/rbac.authorization.k8s.io/v1beta1/clusterroles <K8s::Resource> => HTTP 201: <K8s::Resource> in 0.063s
I, [2018-08-22T08:02:11.294725 #1]  INFO -- Pharos::Kube::Stack<cert-manager>: Create resource extensions/v1beta1:Deployment/cert-manager in namespace kube-system with checksum=81caae86c57ac192eaf80570354a4c2c
I, [2018-08-22T08:02:11.364914 #1]  INFO -- K8s::Transport<https://master.terom-pharos-dev.kontena.works:6443>: POST /apis/extensions/v1beta1/namespaces/kube-system/deployments <K8s::Resource> => HTTP 201: <K8s::Resource> in 0.062s
I, [2018-08-22T08:02:11.366127 #1]  INFO -- Pharos::Kube::Stack<cert-manager>: Create resource apiextensions.k8s.io/v1beta1:CustomResourceDefinition/issuers.certmanager.k8s.io in namespace  with checksum=81caae86c57ac192eaf80570354a4c2c
I, [2018-08-22T08:02:11.465852 #1]  INFO -- K8s::Transport<https://master.terom-pharos-dev.kontena.works:6443>: POST /apis/apiextensions.k8s.io/v1beta1/customresourcedefinitions <K8s::Resource> => HTTP 201: <K8s::Resource> in 0.097s
I, [2018-08-22T08:02:11.467105 #1]  INFO -- Pharos::Kube::Stack<cert-manager>: Create resource certmanager.k8s.io/v1alpha1:Issuer/letsencrypt in namespace default with checksum=81caae86c57ac192eaf80570354a4c2c
I, [2018-08-22T08:02:11.517103 #1]  INFO -- K8s::Transport<https://master.terom-pharos-dev.kontena.works:6443>: GET /apis/certmanager.k8s.io/v1alpha1 => HTTP 200: <K8s::API::MetaV1::APIResourceList> in 0.047s
Unknown resource kind=Issuer for certmanager.k8s.io/v1alpha1
/usr/local/bundle/gems/k8s-client-0.3.3/lib/k8s/api_client.rb:79:in `client_for_resource'
/usr/local/bundle/gems/k8s-client-0.3.3/lib/k8s/client.rb:110:in `client_for_resource'
/usr/local/bundle/gems/k8s-client-0.3.3/lib/k8s/client.rb:116:in `create_resource'
/usr/local/bundle/gems/k8s-client-0.3.3/lib/k8s/stack.rb:86:in `block in apply'
/usr/local/bundle/gems/k8s-client-0.3.3/lib/k8s/stack.rb:74:in `map'
/usr/local/bundle/gems/k8s-client-0.3.3/lib/k8s/stack.rb:74:in `apply'
/app/lib/pharos/addon.rb:200:in `apply_resources'
/app/lib/pharos/addon.rb:171:in `apply_install'
/app/lib/pharos/addon.rb:157:in `apply'
/app/lib/pharos/cluster_manager.rb:129:in `block in apply_addons'
/app/lib/pharos/addon_manager.rb:79:in `block in each'
/app/lib/pharos/addon_manager.rb:91:in `block in with_enabled_addons'
/app/lib/pharos/addon_manager.rb:88:in `each'
/app/lib/pharos/addon_manager.rb:88:in `with_enabled_addons'
/app/lib/pharos/addon_manager.rb:75:in `each'
/app/lib/pharos/cluster_manager.rb:126:in `apply_addons'
/app/lib/pharos/up_command.rb:108:in `configure'
/app/lib/pharos/up_command.rb:44:in `block in execute'
/app/lib/pharos/up_command.rb:43:in `chdir'
/app/lib/pharos/up_command.rb:43:in `execute'
/usr/local/bundle/gems/clamp-1.2.1/lib/clamp/command.rb:63:in `run'
/usr/local/bundle/gems/clamp-1.2.1/lib/clamp/subcommand/execution.rb:11:in `execute'
/usr/local/bundle/gems/clamp-1.2.1/lib/clamp/command.rb:63:in `run'
/usr/local/bundle/gems/clamp-1.2.1/lib/clamp/command.rb:132:in `run'
/app/lib/pharos/root_command.rb:18:in `run'
bin/pharos-cluster:12:in `<main>'

Fix would be to retry addons on K8s::Error.

jnummelin commented 5 years ago

fixed in #628