Closed ky0shiro closed 2 years ago
Hi @ky0shiro, based on the logs you sent, it seems like the request never made it to your injector. You would get a log entry looking like this:
2020-01-06T15:10:18.658Z [INFO] handler: Request received: Method=POST URL=/mutate?timeout=30s
Can you provide the following:
kubectl describe service vault-agent-injector-svc
kubectl describe mutatingwebhookconfigurations vault-agent-injector-cfg
@jasonodonnell : service:
Name: vault-agent-injector-svc
Namespace: my-namespace
Labels: app.kubernetes.io/instance=vault
app.kubernetes.io/managed-by=Tiller
app.kubernetes.io/name=vault-agent-injector
Annotations: flux.weave.works/antecedent: my-namespace:helmrelease/vault
Selector: app.kubernetes.io/instance=vault,app.kubernetes.io/name=vault-agent-injector,component=webhook
Type: ClusterIP
IP: 10.210.4.175
Port: <unset> 443/TCP
TargetPort: 8080/TCP
Endpoints: 10.16.0.198:8080
Session Affinity: None
Events: <none>
mutatingwebhookconfigurations:
Name: vault-agent-injector-cfg
Namespace:
Labels: app.kubernetes.io/instance=vault
app.kubernetes.io/managed-by=Tiller
app.kubernetes.io/name=vault-agent-injector
Annotations: flux.weave.works/antecedent: my-namespace:helmrelease/vault
API Version: admissionregistration.k8s.io/v1beta1
Kind: MutatingWebhookConfiguration
Metadata:
Creation Timestamp: 2020-01-06T13:55:54Z
Generation: 2
Resource Version: 56445806
Self Link: /apis/admissionregistration.k8s.io/v1beta1/mutatingwebhookconfigurations/vault-agent-injector-cfg
UID: 4195285e-308c-11ea-8917-4201ac10000a
Webhooks:
Client Config:
Ca Bundle: << REDACTED >>
Service:
Name: vault-agent-injector-svc
Namespace: my-namespace
Path: /mutate
Failure Policy: Ignore
Name: vault.hashicorp.com
Namespace Selector:
Rules:
API Groups:
API Versions:
v1
Operations:
CREATE
UPDATE
Resources:
pods
Side Effects: Unknown
Events: <none>
What version of Kube are you using?
Are you using a managed Kube service such as GKE/EKS or did you deploy your own?
version is 1.13.11 and it is GKE
Client Version: version.Info{Major:"", Minor:"", GitVersion:"v0.0.0-master+70132b0f13", GitCommit:"70132b0f130acc0bed193d9ba59dd186f0e634cf", GitTreeState:"", BuildDate:"1970-01-01T00:00:00Z", GoVersion:"go1.13.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"13+", GitVersion:"v1.13.11-gke.14", GitCommit:"56d89863d1033f9668ddd6e1c1aea81cd846ef88", GitTreeState:"clean", BuildDate:"2019-11-07T19:12:22Z", GoVersion:"go1.12.11b4", Compiler:"gc", Platform:"linux/amd64"}
@ky0shiro Interesting, our acceptance testing runs on a GKE cluster and is working fine. What you showed me looks correct but the request doesn't seem to make it to the injector. Do you have access to the Kube apiserver logs? I wonder if an error can be found there when Kube tries to contact the webhook.
@ky0shiro Can you also provide me the output of the following command by execing into the Vault Injector container?
cat /etc/resolv.conf
@jasonodonnell Here is /etc/resolv.conf
nameserver 10.210.0.10
search my-namespace.svc.cluster.local svc.cluster.local cluster.local c.my-project.internal google.internal
options ndots:5
@jasonodonnell logs from /logs/kube-apiserver.log (only lines containing injector
word)
injector.txt
I've upgraded my vault helm chart from 0.2.1 to 0.3.3 on a GKE cluster - everything was working fine before since I was using the vault-agent and consul-template sidecar containers to render the secrets on the pod.
Now that I've upgraded, I can't get the vault-k8s to work. Is there any chance we're somehow ending up with a development version of hashicorp/vault-k8s:v0.1.2?
I'm in the exact same situation as ky0shiro and looking at the docker files and the vault-server-agent-injector container I end up with, it seems it's loading up the development version:
$ ps xa
PID USER TIME COMMAND
1 vault 0:14 /vault-k8s agent-inject 2>&1
62 vault 0:00 sh
74 vault 0:00 ps xa
/ $
$ /vault-k8s --version
0.1.2-dev
/ $ curl
sh: curl: not found
/ $
Are these the way they should be? Maybe we're not seeing any connections on the vault-server-agent-injector because we're missing the certs altogether and the api server can't actually connect.
Thanks
Hi @mateipopa, these are the correct builds. Release engineering had not completed the official build pipeline, however, they are being built internally using the dev
builds.
One thing you might investigate are firewall rules on your GKE nodes. We've seen similar issues with injection due to 8080 being blocked: https://github.com/hashicorp/vault-k8s/issues/46
Hello @jasonodonnell, thanks for pointing me in the right direction. Although I didn't get any errors in stackdriver, connections were indeed blocked by the firewall. Adding a rule to allow traffic from the master to the worker nodes solved the problem and requests now reach the injector. Thanks again!
Having this same issue with GKE, opening port to my apiserver 8080 did not do the trick for me.
I have the same issue:
I wonder if the problem is between Kubernetes not contacting the webhook, or the webhook not contacting vault.
How can I troubleshoot further?
Ensure all the components of the vault-injector are installed the the same namespace where you're looking to retrieve your secrets
@Kampe , this would mean that in any namespace I'd like to fetch secrets from, I'd have to deploy a new vault-injector components. I can test your suggestion, but it can not be the official recommended solution, right?
@ky0shiro @jasonodonnell any luck yet? I am facing exactly the same problem @ky0shiro described. Double verified everything. After applying patch annotation to the deployment, again 1 container got spawned on the new pod, and i was hoping it to be 2.(the second one to be vault sidecar one). Is this problem on specific vault charts?
A lot of these issues sound like what we've seen happen with private GKE clusters, for example: https://github.com/hashicorp/vault-helm/issues/214#issuecomment-592702596
So if that matches your setup, please try adding a firewall rule to allow the master to access 8080 on the nodes: https://cloud.google.com/kubernetes-engine/docs/how-to/private-clusters#add_firewall_rules
If it doesn't, then it would help to know where your k8s cluster is running and how it's configured. If the configurations are too varied we might need to break this up into separate issues for clarity. Cheers!
I found the issue: as I'm running on OpenShift 3.11 (Kubernetes 1.11), the API config had to be changed so it supports admission controllers.
MutatingAdmissionWebhook:
configuration:
apiVersion: v1
disable: false
kind: DefaultAdmissionConfig
ValidatingAdmissionWebhook:
configuration:
apiVersion: v1
disable: false
kind: DefaultAdmissionConfig
This block must be present in the master-config.yml in the section admissionConfig.pluginConfig
. After restarting the apiserver, the webhook started to kick in. But the sidecar was still not injected, because of some permission issues. Granting the consumer app's service account cluster-admin permissions or access to the privileged SCC (equivalent of PSP) helped, but then also introduces other security issues.
A lot of these issues sound like what we've seen happen with private GKE clusters, for example: hashicorp/vault-helm#214 (comment)
So if that matches your setup, please try adding a firewall rule to allow the master to access 8080 on the nodes: https://cloud.google.com/kubernetes-engine/docs/how-to/private-clusters#add_firewall_rules
If it doesn't, then it would help to know where your k8s cluster is running and how it's configured. If the configurations are too varied we might need to break this up into separate issues for clarity. Cheers!
This worked like a charm!!! Thanks @tvoran
I'm experiencing this as well. Like @Kampe I updated the firewall to no avail. I'm getting logs almost exactly like @ky0shiro.
I'm on GKE as well.....and I'm beginning to see a pattern
My 2 cents.
It happening to me when GKE replace node (upgrade/maintenance) and in my case cluster is public.
Same situation right now:
❯ kubectl get nodes
NAME STATUS ROLES AGE VERSION
gke-ero-cluster-ero-node-pool-2cf384d7-79b3 Ready <none> 62m v1.16.8-gke.15
gke-ero-cluster-ero-node-pool-6430b315-un3c Ready <none> 20h v1.16.8-gke.15
gke-ero-cluster-ero-node-pool-b00f5513-o3c7 Ready <none> 21h v1.16.8-gke.15
Google replaced 62 min ago one of nodes and then:
❯ kubectl get pods
NAME READY STATUS RESTARTS AGE
ero-app-d85b548c4-bfk9s 2/2 Running 0 21h
ero-app-d85b548c4-df864 0/1 CrashLoopBackOff 25 62m
ero-app-d85b548c4-jv9b5 0/1 CrashLoopBackOff 25 62m
ero-app-d85b548c4-nr7b4 0/1 CrashLoopBackOff 25 62m
ero-app-d85b548c4-q5q4j 0/1 CrashLoopBackOff 25 62m
ero-app-d85b548c4-x4zbj 2/2 Running 0 21h
To recover it I need to scale deployment to 2 and then back to 6, this happen on each kuberenets node replacement. This bug happens almost every day, tell me if you want me to run something next time when it occurs...
The reason for such behaviour - https://github.com/hashicorp/vault-helm/issues/238
vault-agent-injector was also recreated and all rescheduled pods are rescheduled without vault container inside.
I'm using the latest version of the vault-helm chart 0.6.0 and this issue still seems to be happening kubernetes: v1.15.11-gke.5
However, unlike @ky0shiro I am getting the handlers as I whitelisted 8080
│ 2020-06-22T18:47:00.891Z [INFO] handler: Request received: Method=POST URL=/mutate?timeout=30s │
│ 2020-06-22T18:47:00.893Z [DEBUG] handler: checking if should inject agent..
Looks like I had to put the annotations in the right spot.
Annotations: https://www.vaultproject.io/docs/platform/k8s/injector/examples
I found the issue: as I'm running on OpenShift 3.11 (Kubernetes 1.11), the API config had to be changed so it supports admission controllers.
MutatingAdmissionWebhook: configuration: apiVersion: v1 disable: false kind: DefaultAdmissionConfig ValidatingAdmissionWebhook: configuration: apiVersion: v1 disable: false kind: DefaultAdmissionConfig
This block must be present in the master-config.yml in the section
admissionConfig.pluginConfig
. After restarting the apiserver, the webhook started to kick in. But the sidecar was still not injected, because of some permission issues. Granting the consumer app's service account cluster-admin permissions or access to the privileged SCC (equivalent of PSP) helped, but then also introduces other security issues.
@mikemowgli Thanks for the info. I added those block of lines in the master-config.yaml, but my openshift cluster says is not enabled in its logs. can you tell me how did u enable it ?
I0719 07:28:30.711521 1 plugins.go:84] Registered admission plugin "MutatingAdmissionWebhook"
I0719 07:28:31.408404 1 plugins.go:84] Registered admission plugin "MutatingAdmissionWebhook"
I0719 07:28:32.187811 1 register.go:151] Admission plugin MutatingAdmissionWebhook is not enabled. It will not be started.
I0719 07:28:32.361736 1 plugins.go:84] Registered admission plugin "MutatingAdmissionWebhook"
I was able to make it work, I had to restart few services in the master node after making these changes. I followed this link https://access.redhat.com/solutions/3869391
Open 8080,443 ports in your VPC's Firewall resolving the problem. It work fine for me.
I've been fighting this same issue for several days as well where agent injector does not finish intializing and the container does not start. I'm ruuming on an AWS eks cluster and if it's a port issue between the control plane and the nodes, anyone know how to enable 8080 on AWS eks?
following these instructions from AWS, i added in inbound rule to the node security group
to allow all TCP traffic ports 0 - 65535 from the control plane security group
, but no luck with the sample deployment initializing. below is some log data from vault and the vault injector as well as a kubectl describe
of the sample deployment. could definitely use some guidance on how to troubleshoot this further
Vault Logs
identity: creating a new entity: alias="id:"892e70b6-508d-6b11-0fe7-4e3d273cb868" canonical_id:"fc7332af-4001-bc99-9252-eaebaa41b826" mount_type:"kubernetes" mount_accessor:"auth_kubernetes_edb6b310" mount_path:"auth/kubernetes/" metadata:{key:"service_account_name" value:"vault-auth"} metadata:{key:"service_account_namespace" value:"default"} metadata:{key:"service_account_secret_name" value:"vault-auth-token-jl7h4"} metadata:{key:"service_account_uid" value:"1a40077e-f3ae-4953-bc6d-9f742d0278d2"} name:"1a40077e-f3ae-4953-bc6d-9f742d0278d2" creation_time:{seconds:1602346759 nanos:237072421} last_update_time:{seconds:1602346759 nanos:237072421} namespace_id:"root""
Injector Logs
Registering telemetry path on "/metrics"
2020-10-10T16:18:12.311Z [INFO] handler: Starting handler..
Listening on ":8080"...
Updated certificate bundle received. Updating certs...
2020-10-10T16:19:14.074Z [INFO] handler: Request received: Method=POST URL=/mutate?timeout=30s
2020-10-10T16:19:14.076Z [DEBUG] handler: checking if should inject agent..
2020-10-10T16:19:14.076Z [DEBUG] handler: checking namespaces..
2020-10-10T16:19:14.076Z [DEBUG] handler: setting default annotations..
2020-10-10T16:19:14.077Z [DEBUG] handler: creating new agent..
2020-10-10T16:19:14.077Z [DEBUG] handler: validating agent configuration..
2020-10-10T16:19:14.077Z [DEBUG] handler: creating patches for the pod..
kubectl describe from the sample deployment
Name: app-d6d9b9755-2l856
Namespace: default
Priority: 0
Node: ip-192-168-26-157.ec2.internal/192.168.26.157
Start Time: Sat, 10 Oct 2020 11:19:14 -0500
Labels: app=vault-agent-demo
pod-template-hash=d6d9b9755
Annotations: kubernetes.io/psp: eks.privileged
vault.hashicorp.com/agent-inject: true
vault.hashicorp.com/agent-inject-secret-poc-secret: secrets/dev/poc-secret
vault.hashicorp.com/agent-inject-status: injected
vault.hashicorp.com/role: app-user
vault.hashicorp.com/tls-skip-verify: true
Status: Pending
IP: 192.168.9.35
Controlled By: ReplicaSet/app-d6d9b9755
Init Containers:
vault-agent-init:
Container ID: docker://6ab5f0688f5dea6a416fa5ad8fc5395675ebba37ea1f54a1b4f7e1b56d4cb768
Image: vault:1.5.2
Image ID: docker-pullable://vault@sha256:9aa46d9d9987562013bfadce166570e1705de619c9ae543be7c61953f3229923
Port: <none>
Host Port: <none>
Command:
/bin/sh
-ec
Args:
echo ${VAULT_CONFIG?} | base64 -d > /home/vault/config.json && vault agent -config=/home/vault/config.json
State: Running
Started: Sat, 10 Oct 2020 11:19:19 -0500
Ready: False
Restart Count: 0
Limits:
cpu: 500m
memory: 128Mi
Requests:
cpu: 250m
memory: 64Mi
Environment:
VAULT_LOG_LEVEL: info
VAULT_CONFIG: eyJhdXRvX2F1dGgiOnsibWV0aG9kIjp7InR5cGUiOiJrdWJlcm5ldGVzIiwibW91bnRfcGF0aCI6ImF1dGgva3ViZXJuZXRlcyIsImNvbmZpZyI6eyJyb2xlIjoiYXBwLXVzZXIifX0sInNpbmsiOlt7InR5cGUiOiJmaWxlIiwiY29uZmlnIjp7InBhdGgiOiIvaG9tZS92YXVsdC8udmF1bHQtdG9rZW4ifX1dfSwiZXhpdF9hZnRlcl9hdXRoIjp0cnVlLCJwaWRfZmlsZSI6Ii9ob21lL3ZhdWx0Ly5waWQiLCJ2YXVsdCI6eyJhZGRyZXNzIjoiaHR0cHM6Ly92YXVsdC52YXVsdC5zdmM6ODIwMCIsInRsc19za2lwX3ZlcmlmeSI6dHJ1ZX0sInRlbXBsYXRlIjpbeyJkZXN0aW5hdGlvbiI6Ii92YXVsdC9zZWNyZXRzL3BvYy1zZWNyZXQiLCJjb250ZW50cyI6Int7IHdpdGggc2VjcmV0IFwic2VjcmV0cy9kZXYvcG9jLXNlY3JldFwiIH19e3sgcmFuZ2UgJGssICR2IDo9IC5EYXRhIH19e3sgJGsgfX06IHt7ICR2IH19XG57eyBlbmQgfX17eyBlbmQgfX0iLCJsZWZ0X2RlbGltaXRlciI6Int7IiwicmlnaHRfZGVsaW1pdGVyIjoifX0ifV19
Mounts:
/home/vault from home-init (rw)
/var/run/secrets/kubernetes.io/serviceaccount from vault-auth-token-jl7h4 (ro)
/vault/secrets from vault-secrets (rw)
Containers:
app:
Container ID:
Image: jweissig/app:0.0.1
Image ID:
Port: <none>
Host Port: <none>
State: Waiting
Reason: PodInitializing
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from vault-auth-token-jl7h4 (ro)
/vault/secrets from vault-secrets (rw)
vault-agent:
Container ID:
Image: vault:1.5.2
Image ID:
Port: <none>
Host Port: <none>
Command:
/bin/sh
-ec
Args:
echo ${VAULT_CONFIG?} | base64 -d > /home/vault/config.json && vault agent -config=/home/vault/config.json
State: Waiting
Reason: PodInitializing
Ready: False
Restart Count: 0
Limits:
cpu: 500m
memory: 128Mi
Requests:
cpu: 250m
memory: 64Mi
Environment:
VAULT_LOG_LEVEL: info
VAULT_CONFIG: eyJhdXRvX2F1dGgiOnsibWV0aG9kIjp7InR5cGUiOiJrdWJlcm5ldGVzIiwibW91bnRfcGF0aCI6ImF1dGgva3ViZXJuZXRlcyIsImNvbmZpZyI6eyJyb2xlIjoiYXBwLXVzZXIifX0sInNpbmsiOlt7InR5cGUiOiJmaWxlIiwiY29uZmlnIjp7InBhdGgiOiIvaG9tZS92YXVsdC8udmF1bHQtdG9rZW4ifX1dfSwiZXhpdF9hZnRlcl9hdXRoIjpmYWxzZSwicGlkX2ZpbGUiOiIvaG9tZS92YXVsdC8ucGlkIiwidmF1bHQiOnsiYWRkcmVzcyI6Imh0dHBzOi8vdmF1bHQudmF1bHQuc3ZjOjgyMDAiLCJ0bHNfc2tpcF92ZXJpZnkiOnRydWV9LCJ0ZW1wbGF0ZSI6W3siZGVzdGluYXRpb24iOiIvdmF1bHQvc2VjcmV0cy9wb2Mtc2VjcmV0IiwiY29udGVudHMiOiJ7eyB3aXRoIHNlY3JldCBcInNlY3JldHMvZGV2L3BvYy1zZWNyZXRcIiB9fXt7IHJhbmdlICRrLCAkdiA6PSAuRGF0YSB9fXt7ICRrIH19OiB7eyAkdiB9fVxue3sgZW5kIH19e3sgZW5kIH19IiwibGVmdF9kZWxpbWl0ZXIiOiJ7eyIsInJpZ2h0X2RlbGltaXRlciI6In19In1dfQ==
Mounts:
/home/vault from home-sidecar (rw)
/var/run/secrets/kubernetes.io/serviceaccount from vault-auth-token-jl7h4 (ro)
/vault/secrets from vault-secrets (rw)
Conditions:
Type Status
Initialized False
Ready False
ContainersReady False
PodScheduled True
Volumes:
vault-auth-token-jl7h4:
Type: Secret (a volume populated by a Secret)
SecretName: vault-auth-token-jl7h4
Optional: false
home-init:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium: Memory
SizeLimit: <unset>
home-sidecar:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium: Memory
SizeLimit: <unset>
vault-secrets:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium: Memory
SizeLimit: <unset>
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned default/app-d6d9b9755-2l856 to ip-192-168-26-157.ec2.internal
Normal Pulling 12m kubelet, ip-192-168-26-157.ec2.internal Pulling image "vault:1.5.2"
Normal Pulled 12m kubelet, ip-192-168-26-157.ec2.internal Successfully pulled image "vault:1.5.2"
Normal Created 12m kubelet, ip-192-168-26-157.ec2.internal Created container vault-agent-init
Normal Started 12m kubelet, ip-192-168-26-157.ec2.internal Started container vault-agent-init
@pksurferdad This all looks good, it did indeed inject. You need to check the vault-agent-init
logs to see what's wrong (likely permissions with your Vault role).
kubectl logs <your app pod> -c vault-agent-init
I see you're trying to get a KV secret. Which version of KV is this (1 or 2)? If you're not sure provide the output from:
vault secrets list -detailed
Additionally you should provide the policy that you're attaching to app-user
so I can verify you have the correct permissions.
If you're getting login, permission denied errors, there could be something wrong from Vault's end (like the K8s auth method wasn't configured correctly). Please provide the Vault server logs.
thx for responding @jasonodonnell. well, the init logs were certainly helpful and they led me to my problem, having an incorrect secret path. thx so much, it's working now. i am going to undo some of the AWS networking changes i made to see if they were even necessary.
vault-agent-init logs
2020-10-10T18:27:17.282Z [INFO] sink.file: creating file sink
2020-10-10T18:27:17.282Z [INFO] sink.file: file sink configured: path=/home/vault/.vault-token mode=-rw-r-----
2020-10-10T18:27:17.283Z [INFO] auth.handler: starting auth handler
2020-10-10T18:27:17.283Z [INFO] auth.handler: authenticating
2020-10-10T18:27:17.283Z [INFO] template.server: starting template server
2020/10/10 18:27:17.283255 [INFO] (runner) creating new runner (dry: false, once: false)
2020-10-10T18:27:17.283Z [INFO] sink.server: starting sink server
2020/10/10 18:27:17.283831 [WARN] (clients) disabling vault SSL verification
2020/10/10 18:27:17.283843 [INFO] (runner) creating watcher
2020-10-10T18:27:17.297Z [INFO] auth.handler: authentication successful, sending token to sinks
2020-10-10T18:27:17.297Z [INFO] auth.handler: starting renewal process
2020-10-10T18:27:17.297Z [INFO] sink.file: token written: path=/home/vault/.vault-token
2020-10-10T18:27:17.297Z [INFO] sink.server: sink server stopped
2020-10-10T18:27:17.297Z [INFO] sinks finished, exiting
2020-10-10T18:27:17.297Z [INFO] template.server: template server received new token
2020/10/10 18:27:17.297652 [INFO] (runner) stopping
2020/10/10 18:27:17.297677 [INFO] (runner) creating new runner (dry: false, once: false)
2020/10/10 18:27:17.297800 [WARN] (clients) disabling vault SSL verification
2020/10/10 18:27:17.297825 [INFO] (runner) creating watcher
2020/10/10 18:27:17.297863 [INFO] (runner) starting
2020-10-10T18:27:17.306Z [INFO] auth.handler: renewed auth token
2020/10/10 18:27:17.314963 [WARN] (view) vault.read(secrets/dev/poc-secret): no secret exists at secrets/dev/poc-secret (retry attempt 1 after "250ms")
2020/10/10 18:27:17.572730 [WARN] (view) vault.read(secrets/dev/poc-secret): no secret exists at secrets/dev/poc-secret (retry attempt 2 after "500ms")
2020/10/10 18:27:18.080373 [WARN] (view) vault.read(secrets/dev/poc-secret): no secret exists at secrets/dev/poc-secret (retry attempt 3 after "1s")
2020/10/10 18:27:19.088366 [WARN] (view) vault.read(secrets/dev/poc-secret): no secret exists at secrets/dev/poc-secret (retry attempt 4 after "2s")
2020/10/10 18:27:21.096020 [WARN] (view) vault.read(secrets/dev/poc-secret): no secret exists at secrets/dev/poc-secret (retry attempt 5 after "4s")
2020/10/10 18:27:25.104668 [WARN] (view) vault.read(secrets/dev/poc-secret): no secret exists at secrets/dev/poc-secret (retry attempt 6 after "8s")
2020/10/10 18:27:33.112358 [WARN] (view) vault.read(secrets/dev/poc-secret): no secret exists at secrets/dev/poc-secret (retry attempt 7 after "16s")
i also confirmed that the AWS security group changes i made here https://github.com/hashicorp/vault-k8s/issues/32#issuecomment-706575682 were not necessary. looks like the default AWS EKS cluster deployment using eksctl doesn't require any additional inbound or outbound security group rules.
I found the issue: as I'm running on OpenShift 3.11 (Kubernetes 1.11), the API config had to be changed so it supports admission controllers.
MutatingAdmissionWebhook: configuration: apiVersion: v1 disable: false kind: DefaultAdmissionConfig ValidatingAdmissionWebhook: configuration: apiVersion: v1 disable: false kind: DefaultAdmissionConfig
This block must be present in the master-config.yml in the section
admissionConfig.pluginConfig
. After restarting the apiserver, the webhook started to kick in. But the sidecar was still not injected, because of some permission issues. Granting the consumer app's service account cluster-admin permissions or access to the privileged SCC (equivalent of PSP) helped, but then also introduces other security issues.
I too am running an OpenShift 3.11. The resulting error @mikemowgli hinted at here that comes up if the privileged
SCC isn't set is Error creating: pods "<podname>" is forbidden: unable to validate against any pod security policy: []
.
Adding the privileged
SCC to the pod's service account worked for me, but that's nothing for production. The cluster-admin permission implies the privileged
SCC, this is why adding that role also works.
Upon further investigation I'm convinced this relates to https://github.com/kubernetes/kubernetes/issues/65716 and may have been changed in newer Kubernetes versions. The way I understand it, there are multiple hooks being called before Kubernetes spins up the pod, and the last in that row is the hook that checks against the Pod Security Policy, meaning answering the question "is this pod allowed to run in this configuration that may have been altered by the other hooks?".
Apparently on OpenShift 3.11 / Kubernetes 1.11, while or after executing the MutatingAdmissionWebhook, the securityContext of the resulting pod is lost or not available. This also explains why the list of pod security policies is empty ([]
) in the error message.
Knowing from @mikemowgli's answer that it could be fixed with an SCC, I have played around and to avoid the error, the requiredDropCapabilities
property in the SCC must be empty. It is not a specific entry in the list that makes the check fail, I think it is that if there is any entry in the list, a check is being executed that is then missing the aforementioned context.
I was able to copy the restricted
SCC, set requiredDropCapabilities: []
, assign the SCC to my pod's service account and the pod with the injector came up.
This is not as bad as assigning the privileged
SCC, but it certainly has its security implications and I'm not sure yet if that's okay for production. The capabilities being dropped by default are SET_UID, SET_GID, MKNOD, and KILL.
If anyone could shed more light on this that would be great. Otherwise there's probably nothing else left than upgrading to OpenShift 4.x to use the vault injector.
hey @jasonodonnell , I see you are a great resource for updating configurations
I am working on getting all the vault setup sets (all that is left is to include the injector), fully automated by terraform. https://github.com/sethvargo/vault-on-gke/pull/98
The above PR shows my changes to getting the vault-injector up and running via this terraform project.
I added this firewall rule: https://github.com/sethvargo/vault-on-gke/pull/98/files#diff-833c22bd299aef6aabfe1b427e9ee5f6fe6ca27f9f54ef81f2fb9fb32a5ddb8dR389-R406 which allows mutating requests to come into the sidecar injector:
2021-08-13T15:33:42.096Z [INFO] handler: Request received: Method=POST URL=/mutate?timeout=10s
2021-08-13T15:33:42.105Z [DEBUG] handler: checking if should inject agent..
2021-08-13T15:33:42.106Z [DEBUG] handler: checking namespaces..
2021-08-13T15:33:42.106Z [DEBUG] handler: setting default annotations..
2021-08-13T15:33:42.106Z [DEBUG] handler: creating new agent..
2021-08-13T15:33:42.107Z [DEBUG] handler: validating agent configuration..
2021-08-13T15:33:42.107Z [DEBUG] handler: creating patches for the pod..
however, no patches were made to the pod
Annotations: cni.projectcalico.org/podIP: 10.0.94.28/32
cni.projectcalico.org/podIPs: 10.0.94.28/32
vault.hashicorp.com/agent-inject: true
vault.hashicorp.com/agent-inject-secret-foo: secret/foo
vault.hashicorp.com/role: internal-app
^ the pod still has the same annotations, no additional annotations added, and no secrets injected.
on this pr, https://github.com/sethvargo/vault-on-gke/pull/98, clone locally, run the READMD instructions
then, run these CLI commands (after the READMD export env variables instructions)
# enable secrets, add a secret, write a new policy
vault secrets enable -path=secret -version=2 kv
vault kv put secret/foo a=b
vault policy write internal-app - <<EOH
path "secret/*" {
capabilities = ["read"]
}
EOH
# get into the vault container
gcloud container clusters get-credentials vault --region us-central1
kubectl exec -n vault -it vault-0 --container vault /bin/sh
-- inside container --
# enable service to service auth via kubernetes
export VAULT_TOKEN=“put in master token”
vault auth enable kubernetes
vault write auth/kubernetes/config \
token_reviewer_jwt="$(cat /var/run/secrets/kubernetes.io/serviceaccount/token)" \
kubernetes_host="https://$KUBERNETES_PORT_443_TCP_ADDR:443" \
kubernetes_ca_cert=@/var/run/secrets/kubernetes.io/serviceaccount/ca.crt
--
# add a specific
vault write auth/kubernetes/role/internal-app \
bound_service_account_names=internal-app \
bound_service_account_namespaces=vault \
policies=internal-app \
ttl=24h
and then, I simply deploy this helm app which defines the annotations
https://github.com/agates4/sample-vault-helm-template
inside the repo,
helm install python-service .
then I check all the logs and see the problem I first described above ^
@jasonodonnell, do you think you can help me update this terraform project to be working fully out of the box? can you point me in the right directions?
Thank you!
The problem was the MutatingWebhookConfiguration I created in terraform was using
admission_review_versions = ["v1", "v1beta"]
when it should be using
admission_review_versions = ["v1beta1"]
now the sidecar injects a vault-init container within the deployed python-starter pod.
however I am getting this error on the init container:
error authenticating: error="context deadline exceeded" backoff=1s
and this error means the init pod is stuck forever and the python-starter app is never fully deployed or ready ..
diving into this..
Alright folks -
I codified the entire process via terraform on how to get a sidecar injector working, even on external clusters. https://github.com/sethvargo/vault-on-gke/pull/98
^ Fully documented in this PR 👍
I hope this helps someone! It took me quite a bit of diving in to get this fully working out of the box!
I'm having the same symptom, however in my case, the MutatingWebhookConfiguration
resource is never getting created by the Helm Chart release.
PS > kubectl get mutatingwebhookconfigurations
NAME WEBHOOKS AGE
linkerd-proxy-injector-webhook-config 1 46d
linkerd-tap-injector-webhook-config 1 46d
webhook.pipeline.tekton.dev 1 6d20h
As you can see from the above output, I ran kubectl
from PowerShell, and there is no mutating webhook for Vault, even though I installed it with the Helm Chart.
EDIT: The issue, at least in my case, was that I had installed the Vault Helm Chart in different namespaces, and had deleted one of them. That caused the MutatingWebhookConfiguration
to be deleted, even though I still had a valid Helm release in a different namespace.
I'm going to close this as it seems the original issue is resolved. Please feel free to post in our discuss forum if anyone is still having issues debugging their deployment: https://discuss.hashicorp.com/c/vault/30
Hello, I'm trying to deploy vault with sidecar injector. I'm using this chart: https://github.com/hashicorp/vault-helm and following this manual: https://www.hashicorp.com/blog/injecting-vault-secrets-into-kubernetes-pods-via-a-sidecar/ the only difference is that I don't use dev server mode.
Everything works fine except the injector. When I deploy an app with injector annotations, then pod starts like usual with one container and with mounted app-token secret, but there is no secondary injector container:
There are no errors in logs from vault-agent-injector pod :
Here is my deployment:
Is there any way to debug this issue?