kinvolk / lokomotive

🪦 DISCONTINUED Further Lokomotive development has been discontinued. Lokomotive is a 100% open-source, easy to use and secure Kubernetes distribution from the volks at Kinvolk
https://kinvolk.io/lokomotive-kubernetes/
Apache License 2.0
320 stars 49 forks source link

Make linkerd webhook to use `failurePolicy` as `Fail` #721

Closed surajssd closed 4 years ago

surajssd commented 4 years ago

Right now lokoctl's installation methodology has drawbacks where it installs webhooks first thus blocking out the webhook pod and creating a deadlock. And you start seeing errors like these:

Events from the linkerd namespace:

linkerd       14m         Warning   FailedCreate              replicaset/linkerd-controller-7c7d9bf56              Error creating: Internal error occurred: failed calling webhook "linkerd-proxy-injector.linkerd.io": Post https://linkerd-proxy-injector.linkerd.svc:443/?timeout=30s: dial tcp 10.3.192.109:443: connect: connection refused
linkerd       36m         Normal    ScalingReplicaSet         deployment/linkerd-controller                        Scaled up replica set linkerd-controller-7c7d9bf56 to 2
linkerd       14m         Warning   FailedCreate              replicaset/linkerd-destination-84b94d4df5            Error creating: Internal error occurred: failed calling webhook "linkerd-proxy-injector.linkerd.io": Post https://linkerd-proxy-injector.linkerd.svc:443/?timeout=30s: dial tcp 10.3.192.109:443: connect: connection refused
linkerd       36m         Normal    ScalingReplicaSet         deployment/linkerd-destination                       Scaled up replica set linkerd-destination-84b94d4df5 to 2
linkerd       14m         Warning   FailedCreate              replicaset/linkerd-grafana-8648658f4b                Error creating: Internal error occurred: failed calling webhook "linkerd-proxy-injector.linkerd.io": Post https://linkerd-proxy-injector.linkerd.svc:443/?timeout=30s: dial tcp 10.3.192.109:443: connect: connection refused
linkerd       36m         Normal    ScalingReplicaSet         deployment/linkerd-grafana                           Scaled up replica set linkerd-grafana-8648658f4b to 1
linkerd       14m         Warning   FailedCreate              replicaset/linkerd-identity-6c7b88bcf8               Error creating: Internal error occurred: failed calling webhook "linkerd-proxy-injector.linkerd.io": Post https://linkerd-proxy-injector.linkerd.svc:443/?timeout=30s: dial tcp 10.3.192.109:443: connect: connection refused
linkerd       36m         Normal    ScalingReplicaSet         deployment/linkerd-identity                          Scaled up replica set linkerd-identity-6c7b88bcf8 to 2
linkerd       14m         Warning   FailedCreate              replicaset/linkerd-prometheus-748b48c8c9             Error creating: Internal error occurred: failed calling webhook "linkerd-proxy-injector.linkerd.io": Post https://linkerd-proxy-injector.linkerd.svc:443/?timeout=30s: dial tcp 10.3.192.109:443: connect: connection refused
linkerd       36m         Normal    ScalingReplicaSet         deployment/linkerd-prometheus                        Scaled up replica set linkerd-prometheus-748b48c8c9 to 1
linkerd       14m         Warning   FailedCreate              replicaset/linkerd-proxy-injector-6b979484f          Error creating: Internal error occurred: failed calling webhook "linkerd-proxy-injector.linkerd.io": Post https://linkerd-proxy-injector.linkerd.svc:443/?timeout=30s: dial tcp 10.3.192.109:443: connect: connection refused
linkerd       36m         Normal    ScalingReplicaSet         deployment/linkerd-proxy-injector                    Scaled up replica set linkerd-proxy-injector-6b979484f to 2
linkerd       14m         Warning   FailedCreate              replicaset/linkerd-sp-validator-6896bc59c4           Error creating: Internal error occurred: failed calling webhook "linkerd-proxy-injector.linkerd.io": Post https://linkerd-proxy-injector.linkerd.svc:443/?timeout=30s: dial tcp 10.3.192.109:443: connect: connection refused
linkerd       36m         Normal    ScalingReplicaSet         deployment/linkerd-sp-validator                      Scaled up replica set linkerd-sp-validator-6896bc59c4 to 2
linkerd       14m         Warning   FailedCreate              replicaset/linkerd-tap-8f79d6c5c                     Error creating: Internal error occurred: failed calling webhook "linkerd-proxy-injector.linkerd.io": Post https://linkerd-proxy-injector.linkerd.svc:443/?timeout=30s: dial tcp 10.3.192.109:443: connect: connection refused
linkerd       36m         Normal    ScalingReplicaSet         deployment/linkerd-tap                               Scaled up replica set linkerd-tap-8f79d6c5c to 2
linkerd       14m         Warning   FailedCreate              replicaset/linkerd-web-8558657b69                    Error creating: Internal error occurred: failed calling webhook "linkerd-proxy-injector.linkerd.io": Post https://linkerd-proxy-injector.linkerd.svc:443/?timeout=30s: dial tcp 10.3.192.109:443: connect: connection refused
linkerd       36m         Normal    ScalingReplicaSet         deployment/linkerd-web                               Scaled up replica set linkerd-web-8558657b6  

These are the apiserver errors:

I0707 11:56:57.236263       1 trace.go:116] Trace[1183028621]: "Create" url:/api/v1/namespaces/linkerd/pods,user-agent:kube-controller-manager/v1.18.3 (linux/amd64) kubernetes/2e7996e/system:serviceaccount:kube-system:replicaset-controller,client:10.0.4.35 (started: 2020-07-07 11:56:56.213473697 +0000 UTC m=+2383.280605006) (total time: 1.02276867s):
I0707 11:56:57.363868       1 trace.go:116] Trace[1840209148]: "Call mutating webhook" configuration:linkerd-proxy-injector-webhook-config,webhook:linkerd-proxy-injector.linkerd.io,resource:/v1, Resource=pods,subresource:,operation:CREATE,UID:301278b9-e5bc-411b-a77f-46c1718a9a51 (started: 2020-07-07 11:56:56.344687474 +0000 UTC m=+2383.411818776) (total time: 1.019130746s):
W0707 11:56:57.363946       1 dispatcher.go:181] Failed calling webhook, failing closed linkerd-proxy-injector.linkerd.io: failed calling webhook "linkerd-proxy-injector.linkerd.io": Post https://linkerd-proxy-injector.linkerd.svc:443/?timeout=30s: dial tcp 10.3.192.109:443: connect: connection refused

To get around this problem right now linkerd component is set to have the failurePolicy as Ignore. We should fix this problem in lokoctl and change the failurePolicy to Fail.

surajssd commented 4 years ago

Another thing to look at is the way lokoctl creates namespace. Currently lokoctl drops all the namespace configs. And creates a namespace imperatively. This is a lossy operation because if a chart ships namespace config with informantion like labels/annotations then all that is lost.

So try creating the namespace for linkerd with the namespace config shipped in the chart. This might break the deadlock introduced right now by lokoctl. Similar issue: https://github.com/kinvolk/lokomotive/issues/647.

invidian commented 4 years ago

Perhaps we should only drop release namespace and not all namespace objects when sanitizing the charts.

ipochi commented 4 years ago

with helm's install.CreateNamespace = true, it creates a namespace object and creates it on the fly, rather than create it from the release namespace manifest present in the chart.

If install.CreateNamespace = true and there exists a release namespace manifest in the chart, helm throws an error resource already exists.

invidian commented 4 years ago

The original PR eventually removed the failurePolicy: Ignore override, so it is Fail now.