Closed gabrielanavarro closed 1 year ago
@gabrielanavarro You can disable the init container by setting the following annotation:
vault.hashicorp.com/agent-pre-populate: false
At the moment Istio
isn't supported but we're looking into how to make this work.
@jasonodonnell Thank you so much. This worked great!
And is there any security issues doing this way?
Seems that istio init container is patching the pod's iptable before vault-agent-init is executed. Since istio-proxy isn't running, vault-agent-init cannot access the vault server. Switching init containers order in the pod's yaml file (i.e. putting vault-agent-init first instead of istio-init) do the trick, but i don't know if it's possible to do this directly with the mutatingwebhook.
You can also omit the outbound port to Vault from Envoy redirection using the follow annotation: traffic.sidecar.istio.io/excludeOutboundPorts: "8200"
Of course this only works if you're happy with connections to Vault from the init container and sidecar not being intercepted - have only tested this with Istio CNI.
@gabrielanavarro I'm not aware of any security issues caused by disabling the init container. The main issue will be whether your application expects the secrets from vault to be already rendered when your application starts up. The sidecar container will render the secrets, but it's a race between it rendering them and your application consuming them.
But to your initial problem, vault-k8s 0.3.0 now has support for a vault.hashicorp.com/agent-init-first annotation. Setting that to true
should allow the vault init container to run before the istio init container so they both have a better chance of succeeding. We'd love to hear if this works for you!
It's working perfectly
For me it solved the problem of the vault-agent-init
container being initialized later than the istio-proxy
.
However as I was playing around with Vault/Istio I came across a bug, where the pod's yaml was not populated by the vault-agent-init
, nor by the vault-agent
sidecar. I believe there is a race condition between istio-sidecar-injector
and the vault-agent-injector-cfg
MutatingWebhookConfigurations. This is why I probably end ended with pods that have three containers (application, istio-proxy, vault-agent) and pods that had only two (application, istio-proxy).
I have managed to solve the issue, when I merged the two configs into a single MutatingWebhookConfiguration
. However I'm definitely sure that such a Hack is not a long-term solution. In this case I always (correctly) ended up with 3 containers (app + 2 sidecars)
apiVersion: admissionregistration.k8s.io/v1beta1
kind: MutatingWebhookConfiguration
metadata:
labels:
app.kubernetes.io/instance: vault
app.kubernetes.io/managed-by: Tiller
app.kubernetes.io/name: vault-agent-injector
app: sidecar-injector
operator.istio.io/component: Pilot
operator.istio.io/managed: Reconcile
operator.istio.io/version: 1.5.0
release: istio
name: tamas-injector
webhooks:
- admissionReviewVersions:
- v1beta1
clientConfig:
caBundle: abc
service:
name: vault-agent-injector-svc
namespace: vault
path: /mutate
port: 443
failurePolicy: Ignore
matchPolicy: Exact
name: vault.hashicorp.com
namespaceSelector: {}
objectSelector: {}
reinvocationPolicy: IfNeeded
rules:
- apiGroups:
- ""
apiVersions:
- v1
operations:
- CREATE
- UPDATE
resources:
- pods
scope: "*"
sideEffects: Unknown
timeoutSeconds: 30
- admissionReviewVersions:
- v1beta1
clientConfig:
caBundle: xyz
service:
name: istiod
namespace: istio-system
path: /inject
port: 443
failurePolicy: Fail
matchPolicy: Exact
name: sidecar-injector.istio.io
namespaceSelector:
matchLabels:
istio-injection: enabled
objectSelector: {}
reinvocationPolicy: Never
rules:
- apiGroups:
- ""
apiVersions:
- v1
operations:
- CREATE
resources:
- pods
scope: "*"
sideEffects: Unknown
timeoutSeconds: 30
Could also be a namespaceSelector issue or a connection lost between the k8s master and its node (so it fails to mutate the pod against the webhook pod). Have you look after these ?
Hi! Thanks for the tips.
Connection: We are running on Kubernetes on GKE, and I believe it's highly unlikely that the connection would be lost between he Master/Nodes that often. (If that were, the connection would be lost approx. 5-7 times out of 10, which I doubt. Also once merged the two configs together it works 100% of the time, so if it were a connectivity issue I should still be seeing missing sidecars/init containers.
namespaceSelector: I'm deploying to the default namespace which has the label istio-injection: enabled
, so theoretically both istio and vault injectors should pick up the config. And here again: once I have merged the two configs there were no issues, thus I believe it's not the namespaceSelector.
Unfortunately GKE only allows running K8s up to 1.59, and Kubernetes supports monitoring of Webhooks only from 1.6+ (Link)
Did you make sure your namespaces were labeled both for istio and vault ? For instance, my vault webhook configuration has :
namespaceSelector:
matchLabels:
vault-webhook: enabled
So my namespaces are labeled with istio-injection: enabled
and vault-webhook: enabled
@TamasNeumer you can run later versions of k8s on GKE using the rapid channel:
gcloud beta container clusters create example --release-channel=rapid
Not too sure why you're seeing what you are, if the init container and sidecar aren't appearing at all it seems the webhook either isn't being hit at all or it is returning an error - you could try changing to failurePolicy: Fail
to see if calls to your webhook are in fact failing (I think errors would show as events from managing resources, e.g. by describing the managing Replicaset if you're using Deployments).
Some things that might make a difference, but that seem unlikely to be the cause given the randomness you're seeing:
I think the main fix for Istio support is to properly support reinvocationPolicy: IfNeeded
for the vault injector webhook. Looking briefly through the code it seems that vault-k8s is adding an annotation (vault.hashicorp.com/agent-inject-status
) to signify injection and then bailing if the webhook is called again with that annotation set to achieve idempotency. This means though that if a later webhook (e.g. one with a corresponding mutatingwebhookconfiguration that starts with z
) happens to inject an init container first then vault.hashicorp.com/agent-init-first
won't behave as expected.
Instead, the webhook's patching logic should be made idempotent which would allow the webhook to be called again and to reorder the init containers again if needed.
Hi!
Thanks for the lengthy comment. I came to the conclusion that probably I had an older version of istio on my cluster (1.4.0 or maybe even 1.4.0 beta.)
I have tested the following setups:
I believe it was an issue with the older version of istio. Thank you for the support!
You can also omit the outbound port to Vault from Envoy redirection using the follow annotation:
traffic.sidecar.istio.io/excludeOutboundPorts: "8200"
Of course this only works if you're happy with connections to Vault from the init container and sidecar not being intercepted - have only tested this with Istio CNI.
This solution works for me, however I was wondering if I could achieve the same with an istio ServiceEntry
rather than this annotation. (The later being the "preferred" way of solving this problem.)
I was trying to get it work based on the documentation, however to me it seems that despite the ServiceEntry
, the sidecar can't resolve the host. (vault.vault.svc
)
URL: PUT http://vault.vault.svc:8200/v1/auth/kubernetes/login
Code: 404. Raw Message:
" backoff=1.5269282149999999 2020-03-27T10:17:14.180Z [INFO] auth.handler: authenticating 2020-03-27T10:17:14.184Z [ERROR] auth.handler: error authenticating: error="Error making API request.
- ServiceEntry:
```yaml
apiVersion: networking.istio.io/v1alpha3
kind: ServiceEntry
metadata:
name: vault-service-entry
spec:
hosts:
- vault.vault.svc
ports:
- number: 8200
name: http
protocol: HTTP
location: MESH_EXTERNAL
resolution: DNS
Also I have added a busybox
sidecar, and from inside I wanted to run an nslookup vault.vault.svc
, which (If I understand correctly) tries to go to my kube-dns
(on 10.0.0.10)
/ # nslookup vault
Server: 10.0.0.10
Address: 10.0.0.10:53
** server can't find vault.default.svc.cluster.local: NXDOMAIN
*** Can't find vault.svc.cluster.local: No answer
*** Can't find vault.cluster.local: No answer
*** Can't find vault.c.retail-platform-sandbox-hsord.internal: No answer
*** Can't find vault.google.internal: No answer
*** Can't find vault.default.svc.cluster.local: No answer
*** Can't find vault.svc.cluster.local: No answer
*** Can't find vault.cluster.local: No answer
*** Can't find vault.c.retail-platform-sandbox-hsord.internal: No answer
*** Can't find vault.google.internal: No answer
Info: vault is running in vault
namespace, on the same cluster. The deployment I have been playing around was in the default
service
@TamasNeumer you need to specify the FQDN for the vault service in the ServiceEntry: vault.vault.svc.cluster.local
Additionally, for the busybox sidecar the command should be nslookup vault.vault
The ServiceEntry works fine once the Envoy sidecar is running, the purpose of excluding outbound redirection was so connections could be made to Vault before the Envoy sidecar is running but after redirection happens (i.e. when trying to connect from an init container that runs after istio's init container) - it shouldn't be necessary anymore though with recent changes to vault-k8s
@TamasNeumer you need to specify the FQDN for the vault service in the ServiceEntry:
vault.vault.svc.cluster.local
Additionally, for the busybox sidecar the command should be
nslookup vault.vault
The ServiceEntry works fine once the Envoy sidecar is running, the purpose of excluding outbound redirection was so connections could be made to Vault before the Envoy sidecar is running but after redirection happens (i.e. when trying to connect from an init container that runs after istio's init container) - it shouldn't be necessary anymore though with recent changes to vault-k8s
Amazing! I have managed it to work!
In conclusion I have used only two annotations on the deployment itself:
yamlvault.hashicorp.com/agent-init-first: "true"
vault.hashicorp.com/agent-inject: "true"
And as you have said, the ServiceEntry needed a quick fix, but works:
apiVersion: networking.istio.io/v1alpha3
kind: ServiceEntry
metadata:
name: vault-service-entry
spec:
hosts:
- vault.vault.svc.cluster.local
ports:
- number: 8200
name: http
protocol: HTTP
location: MESH_EXTERNAL
resolution: DNS
@TamasNeumer awesome - there's no way of defining default annotations on a Namespace level atm: https://github.com/kubernetes/kubernetes/issues/35504
You would probably need to write your own mutating admission webhook or use an existing one to achieve this - something OPA based should be enough: https://github.com/open-policy-agent/gatekeeper
Hi
I am busy with implementing vault on kubernetes too, and adding the agent-init-first annotation worked fine to get it all working.. with istio... except for kafka and rabbitmq !! It was working untill I enabled istio. I am getting the following error in the vault-agent-init container: 2020-04-03T14:43:06.806Z [INFO] auth.handler: authenticating 2020-04-03T14:43:06.811Z [ERROR] auth.handler: error authenticating: error="Put http://vault.polystream-hub.svc:8200/v1/auth/kubernetes/login: dial tcp 10.0.20.113:8200: connect: connection refused" backoff=2.7553159369999998
My vault, kafka and rabbitmq are all in the same namespace...
This is only happening in the kafka and rabbitmq pods...
Any ideas?
Has anyone else seen the issue referenced by @TamasNeumer where some pods end up with three containers (application, istio-proxy, vault-agent) and some pods only two (application, istio-proxy)? This happens every time I start my cluster, but if I patch a pod the vault-agent will spin up and write out the secrets just fine.
Running k8s 1.15 on AWS EKS Istio 1.6 Vault k8s 0.3 Skaffold 1.5
Annotations: vault.hashicorp.com/agent-inject: "true" vault.hashicorp.com/agent-init-first: "true"
I also tried locally on docker-deskop k8s v1.16.5 and the same thing happens.
After debugging some more and reviewing the API logs I found this:
failed calling webhook "vault.hashicorp.com": Post https://vault-agent-injector-svc.default.svc:443/mutate?timeout=30s: no endpoints available for service "vault-agent-injector-svc"
However - I see this as a red herring since as I mentioned above if I patch a deployment everything works as expected. It would seem to me that the istio/envoy network and vault agent eventually get to a point that it can be injected.
Thoughts?
@davidcunningham I think it's just the webhook is configured to fail open, so when you first start up your cluster the webhook is taking longer than it takes before some pods are created.
If you configure the webhook to reject requests if it fails then controllers should retry.
@dippynark - thanks! Adding failurePolicy: Fail
to the webhook seems to have done it! Will continue to test scenarios.
@TamasNeumer In your setup, is Istio also enabled for vault
and vault-agent-injector
? I'm running into same issue currently, and I've Istio enabled with mTLS for everything.
@TamasNeumer @a8j8i8t8 , we are having similar issue.
When we enable istio in vault name space (where vault injector deployed, external vault) we are getting errors when application pod deployed in application name space, but in case if istio disabled everything working fine (The application name space always has istio, )
Hello! I have similar Issue.
I'm trying to get vault-agent-injector-svc
working but I'm stuck. All pods when are created are without the sidecar from vault. If I set failurePolicy: Fail
then the replicaset events:
Internal error occurred: failed calling webhook "vault.hashicorp.com": Post https://vault-agent-injector-svc.vault.svc:443/mutate?timeout=30s: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Internal error occurred: failed calling webhook "vault.hashicorp.com": Post https://vault-agent-injector-svc.vault.svc:443/mutate?timeout=30s: dial tcp 10.8.2.36:8080: i/o timeout
Error creating: Internal error occurred: failed calling webhook "vault.hashicorp.com": Post https://vault-agent-injector-svc.vault.svc:443/mutate?timeout=30s: context deadline exceeded
So, I have two service entries:
apiVersion: networking.istio.io/v1alpha3
kind: ServiceEntry
metadata:
name: vault-service-entry
spec:
hosts:
- vault.vault.svc.cluster.local
ports:
- number: 8200
name: http
protocol: HTTP
location: MESH_EXTERNAL
resolution: DNS
apiVersion: networking.istio.io/v1alpha3
kind: ServiceEntry
metadata:
name: vault-injector-service-entry
spec:
hosts:
- vault-agent-injector-svc.vault.svc.cluster.local
ports:
- number: 8080
name: http
protocol: HTTP
- number: 443
name: tcp
protocol: TCP
location: MESH_EXTERNAL
resolution: DNS
With the second one, I started receiving requests on vault-agent-injector-svc
but with errors:
2020-10-14T06:32:05.981Z [INFO]
handler: Starting handler..
Listening on ":8080"...
Updated certificate bundle received. Updating certs...
2020/10/14 06:32:15 http: TLS handshake error from 10.8.1.29:60748: remote error: tls: unknown certificate authority
2020/10/14 06:32:15 http: TLS handshake error from 10.8.0.19:39344: remote error: tls: unknown certificate authority
2020/10/14 06:32:15 http: TLS handshake error from 10.8.5.109:33832: remote error: tls: unknown certificate authority
2020/10/14 06:32:16 http: TLS handshake error from 10.8.4.92:40274: remote error: tls: unknown certificate authority
2020/10/14 06:32:16 http: TLS handshake error from 10.8.4.92:40278: remote error: tls: unknown certificate authority
2020/10/14 06:32:17 http: TLS handshake error from 10.8.1.29:60840: remote error: tls: unknown certificate authority
And the replicaset events its giving that one:
replicaset-controller Error creating: Internal error occurred: failed calling webhook "vault.hashicorp.com": Post https://vault-agent-injector-svc.vault.svc:443/mutate?timeout=30s: context deadline exceeded
I though the cert was wrong, but the mutatingwebhookconfiguration
has the correct caBundle
and seems fine.
I also check with nslookup
if there are connection:
$ nslookup vault.vault.svc
Server: 10.0.32.10
Address: 10.0.32.10#53
Non-authoritative answer:
Name: vault.vault.svc.cluster.local
Address: 10.0.34.33
$ nslookup vault-agent-injector-svc.vault.svc
Server: 10.0.32.10
Address: 10.0.32.10#53
Non-authoritative answer:
Name: vault-agent-injector-svc.vault.svc.cluster.local
Address: 10.0.32.142
Not sure what to do next.
This are my vault config:
injector:
logLevel: "trace"
certs:
secretName: "injector-tls"
caBundle: "xxx"
server:
extraEnvironmentVars:
GOOGLE_REGION: xxxx
GOOGLE_PROJECT: xxxx
GOOGLE_APPLICATION_CREDENTIALS: /vault/userconfig/kms-creds/credentials.json
extraVolumes:
- type: 'secret'
name: 'kms-creds'
service:
enabled: true
ha:
enabled: true
replicas: 2
config: |
ui = true
listener "tcp" {
tls_disable = 1
address = "[::]:8200"
cluster_address = "[::]:8201"
}
seal "gcpckms" {
project = "xxxx"
region = "xxxx"
key_ring = "xxxx"
crypto_key = "xxxx"
}
storage "postgresql" {
connection_url = "xxxx"
ha_enabled = "true"
max_parallel = "128"
}
log_level = "trace"
Hi All, I'm stuck with the same issue. I am using on-prem k8s v1.19 and Istio with 1.8.0.. I got stuck to run them together properly when I inject istio mesh to the hub-dev
where our microservices are running. Vault is running dev
namespace.
The first issue I had is Vault and Istio sidecar is not running properly somehow and the application can not able to init as below. I tried to use below annotations to init first vault but it did not solve below issue.
Here are the outputs of pod status and describe
$ kubectl get pods -n hub-dev
NAME READY STATUS RESTARTS AGE
oneapihub-mp-dev-59f7685455-5kmft 0/3 Init:0/2 0 19
$ kubectl describe pod oneapihub-mp-dev-59f7685455-5kmft -n hub-dev
Init Containers:
vault-agent-init:
Container ID:
State: Running
Started: Fri, 15 Jan 2021 13:54:30 +0300
Ready: False
istio-validation:
Container ID:
Image: reg-dhc.app.corpintra.net/i3-mirror/docker.io_istio_proxyv2:1.8.0
State: Waiting
Reason: PodInitializing
Ready: False
Containers:
oneapihub-mp:
Container ID:
State: Waiting
Reason: PodInitializing
Ready: False
istio-proxy:
Container ID:
State: Waiting
Reason: PodInitializing
Ready: False
istio-proxy:
Container ID:
State: Waiting
Reason: PodInitializing
Ready: False
Normal Pulled 16m kubelet, xx-kube-node07 Container image "docker.io_vault:1.5.2" already present on machine
Normal Created 16m kubelet, xx-kube-node07 Created container vault-agent-init
Normal Started 16m kubelet, xx-kube-node07 Started container vault-agent-init
When I tried below annotation, it fixed the above issue, but this time when pod starts to run it can not able to find /vault/secrets
path but somehow after that it can be read when I checked the logs of proxy and application and /vault/secrets
folder is exist inside pod.
- vault.hashicorp.com/agent-pre-populate: "false"
Here the logs of app even if folder exists
$ kubectl get pods -n hub-dev
oneapihub-mp-dev-78449b8cf6-qbqhn 3/3 Running 0 9m31s
$ kubectl logs -f oneapihub-mp-dev-78449b8cf6-qbqhn -n hub-dev -c oneapihub-mp
> market-place@1.0.0 start:docker /usr/src/app
> node app.js
{"message""devMessage":"SECRET_READ_ERROR","data":"","exception":"ENOENT: no such file or directory, open '/vault/secrets/database'","stack":"Error: ENOENT: no such file or directory, open '/vault/secrets/database'->
/ $ cd /vault/secrets
/vault/secrets $ ls
database jenkins
/vault/secrets $
Here I have some PUT error which might related with Vault itself but I am confuse how then Vault can inject the secrets.
$ kubectl logs -f oneapihub-mp-dev-78449b8cf6-qbqhn -n hub-dev -c vault-agent
2021-01-15T11:21:13.477Z [ERROR] auth.handler: error authenticating: error="Put "http://vault.dev.svc:8200/v1/auth/kubernetes/login": dial tcp 10.254.30.115:8200: connect: connection refused" backoff=2.464775515
==> Vault agent started! Log data will stream in below:
==> Vault agent configuration:
Cgo: disabled
Log Level: info
Version: Vault v1.5.2
Version Sha: 685fdfa60d607bca069c09d2d52b6958a7a2febd
2021-01-15T11:21:15.942Z [INFO] auth.handler: authenticating
2021-01-15T11:21:15.966Z [INFO] auth.handler: authentication successful, sending token to sinks
2021-01-15T11:21:15.966Z [INFO] sink.file: token written: path=/home/vault/.vault-token
And lastly when I checkted the istio-proxy logs I can see that GET or PUT request returns 200.
$ kubectl logs -f oneapihub-mp-dev-78449b8cf6-h8s8j -n hub-dev -c istio-proxy
021-01-15T11:35:04.352221Z warning envoy filter mTLS PERMISSIVE mode is used, connection can be either plaintext or TLS, and client cert can be omitted. Please consider to upgrade to mTLS STRICT mode for more secure configuration that only allows TLS connection with client cert. See https://istio.io/docs/tasks/security/mtls-migration/
[2021-01-15T11:35:05.557Z] "PUT /v1/auth/kubernetes/login HTTP/1.1" 200 - "-" 1294 717 8 8 "-" "Go-http-client/1.1" "a082698b-d1f7-4aa5-9db5-01d86d5093ef" "vault.dev.svc:8200" "10.6.24.55:8200" outbound|8200||vault.dev.svc.cluster.local 10.6.19.226:55974 10.254.30.115:8200 10.6.19.226:60478 - default
2021-01-15T11:35:05.724833Z info Envoy proxy is ready
[2021-010.6.19.226:41888 - default
[2021-01-15T11:35:05.596Z] "GET /v1/secret/data/oneapihub-marketplace/database HTTP/1.1" 200 - "-" 0 400 0 0 "-" "Go-http-client/1.1" "d7d10c1f-c445-44d1-b0e3-bb9ae7bbc2f0" "vault.dev.svc:8200" "10.6.24.55:8200" outbound|8200||vault.dev.svc.cluster.local 10.6.19.226:55974 10.254.30.115:8200 10.6.19.226:41900 - default
[2021-01-15T11:35:05.591Z] "PUT /v1/auth/token/renew-self HTTP/1.1" 200 - "-" 15 717 8 8 "-" "Go-http-client/1.1" "56705e5c-c966-4bc8-8187-7ca5bb2b4abe" "vault.dev.svc:8200" "10.6.24.55:8200" outbound|8200||vault.dev.svc.cluster.local 10.6.19.226:37388 10.254.30.115:8200 10.6.19.226:41890 - default
[2021-01-15T11:35:05.602Z] "GET /v1/secret/data/oneapihub-marketplace/jenkins HTTP/1.1" 200 - "-" 0 284 0 0 "-" "Go-http-client/1.1" "1b6d8601-18df-4f32-8722-162aa785c476" "vault.dev.svc:8200" "10.6.24.55:8200" outbound|8200||vault.dev.svc.cluster.local 10.6.19.226:55974 10.254.30.115:8200 10.6.19.226:41902 - default
@gabrielanavarro I'm not aware of any security issues caused by disabling the init container. The main issue will be whether your application expects the secrets from vault to be already rendered when your application starts up. The sidecar container will render the secrets, but it's a race between it rendering them and your application consuming them.
But to your initial problem, vault-k8s 0.3.0 now has support for a vault.hashicorp.com/agent-init-first annotation. Setting that to
true
should allow the vault init container to run before the istio init container so they both have a better chance of succeeding. We'd love to hear if this works for you!
Setting vault.hashicorp.com/agent-init-first: true works for me. Thanks!!
same issue here. vault-agent-init starts before the istio-validation init container thanks to vault.hashicorp.com/agent-init-first annotation, however it never completes and stuck with "connect : connection refused"
tested with istio 1.9 with CNI plugin installed on k8s v1.18 @semihural do you success to find a solution ?
ps: traffic.sidecar.istio.io/excludeOutboundPorts is not a desirable option.
@einret
This worked for me.
template:
metadata:
annotations:
traffic.sidecar.istio.io/excludeOutboundPorts: "8200"
vault.hashicorp.com/agent-init-first: "true"
vault.hashicorp.com/agent-inject: "true"
I was just looking into Vault and how it would interface with our istio setup. Curious if anyone has tried the following and whether it would solve the aforementioned issues. Not sure if it's an init or main container issue but, I'm going to give it a try in a bit.
metadata:
annotations:
proxy.istio.io/config: '{ "holdApplicationUntilProxyStarts": true }'
I have a setup where Vault is in another cluster and we have a DNS entry that points to a load balancer. I've added a ServiceEntry
and the following annotations:
traffic.sidecar.istio.io/excludeOutboundPorts: "8200"
vault.hashicorp.com/agent-inject: "true"
vault.hashicorp.com/agent-pre-populate: "false"
vault.hashicorp.com/log-level: debug
The agent will say:
2022-01-21T00:54:40.845Z [INFO] auth.handler: starting auth handler
2022-01-21T00:54:40.845Z [INFO] auth.handler: authenticating
2022-01-21T00:55:40.845Z [ERROR] auth.handler: error authenticating: error="context deadline exceeded" backoff=1s
2022-01-21T00:55:41.845Z [INFO] auth.handler: authenticating
2022-01-21T00:56:41.845Z [ERROR] auth.handler: error authenticating: error="context deadline exceeded" backoff=1.53s
2022-01-21T00:56:43.376Z [INFO] auth.handler: authenticating
2022-01-21T00:57:43.377Z [ERROR] auth.handler: error authenticating: error="context deadline exceeded" backoff=2.95s
2022-01-21T00:57:46.336Z [INFO] auth.handler: authenticating
My hunch is something is being blocked because why else would authenticating timeout?
Anyone else hit this?
@einret
This worked for me.
template: metadata: annotations: traffic.sidecar.istio.io/excludeOutboundPorts: "8200" vault.hashicorp.com/agent-init-first: "true" vault.hashicorp.com/agent-inject: "true"
Thanks @semihural That worked for me as well
I have the vault installed outside the service mesh in another namespace but inside the same cluster. I've added the service entry:
apiVersion: networking.istio.io/v1alpha3 kind: ServiceEntry metadata: name: vault-service-entry spec: hosts:
I also added the annotations on the pod that has the vault sidecar: traffic.sidecar.istio.io/excludeOutboundPorts: "8200" vault.hashicorp.com/agent-init-first: "true" vault.hashicorp.com/agent-inject: "true"
It still does not work. The init container starts and cannot access the vault api:
auth.handler: error authenticating: error="Put "https://vault-dev.vault.svc:8200/v1/auth/kubernetes/login": dial tcp: lookup vault-dev.vault.svc on 172.18.0.10:53: read udp 10.208.7.161:59314->172.18.0.10:53: read: connection refused" backoff=1.949307983
tl;dr: Set vault.hashicorp.com/agent-run-as-user: "1337"
if using istio-cni.
There are two ways to use istio:
Istio injects an istio-init
init container, which sets up the networking. In this case, the ordering of the init container is relevant, because init containers before istio-init
can access the network without restrictions.
WIth istio-cni, istio will setup the network even before the first init container starts. Istio will still inject an istio-validation
init container, but it will just validate if the network setup is correct. With istio-cni the order of the init container doesn't matter, because it tries to route all traffic through the istio-proxy-sidecar, which isn't yet started while init container are running.
There are at least three solutions to avoid this problem (the last one might surprise you):
Only init containers are a problem, because the istio-proxy-sidecar isn't running at that point. It is possible to disable vault-agent-init and just use the vault-agent-sidecar. This has the drawback, that the application needs to wait for the vault-agent-sidecar to populate the secrets. This can be achieved with a sleep in the application entrypoint (or a more sophisticated approach).
metadata:
annotations:
vault.hashicorp.com/agent-pre-populate: "false"
proxy.istio.io/config: |
holdApplicationUntilProxyStarts: true
holdApplicationUntilProxyStarts
helps, because all container including vault-agent will wait until the istio-proxy-sidecar is ready to process requests.
It is possible to disable the routing through the istio sidecar for specific ports, for example port 8200.
metadata:
annotations:
traffic.sidecar.istio.io/excludeOutboundPorts: "8200"
proxy.istio.io/config: |
proxyMetadata:
ISTIO_META_DNS_CAPTURE: "false"
DNS capture isn't enabled by default, but if activated in general, you need to disable it to make dns resolution in the init container work. This might be the problem for @razvan-miron, as the connection to "172.18.0.10:53" failed, becasue port 53 is DNS. But this will disable DNS capture for the whole pod, which isn't always desirable.
Istio has another solution the avoid routing traffic through the istio-proxy-sidecar. All traffic from uid 1337 is ignored by the istio iptables. The reason for this is, that the istio-proxy runs at that user. This is described here: https://istio.io/latest/docs/setup/additional-setup/cni/#compatibility-with-application-init-containers But if other container, like vault-agent-init, run as this user, their traffic will also be ignored.
The user of the vault container can be changed with the annotation on the pod:
metadata:
annotations:
vault.hashicorp.com/agent-run-as-user: "1337"
or with an environmant variable on the injector
env:
- name: AGENT_INJECT_RUN_AS_USER
value: "1337"
or in the helm chart:
extraEnvironmentVars:
AGENT_INJECT_RUN_AS_USER: 1337
A small drawback might be, that also the traffic of the vault-agent-sidecar isn't routed through istio. But I don't think this is a problem in general, except you want istio metrics of the sidecar.
Thanks @thechristschn the solution
metadata:
annotations:
vault.hashicorp.com/agent-pre-populate: "false"
proxy.istio.io/config: |
holdApplicationUntilProxyStarts: true
This worked for us with RedHat OpenShift ServiceMesh, we did not have to use excludeOutboundPorts
or vault.hashicorp.com/agent-init-first
, the Vault sidecar container was able to resolve our internal vault DNS and read the secrets. We did not have to add any delays to the app pod (your mileage may vary).
With the updates and workarounds provided in this issue, I'm going to go ahead and close it for now. Please feel free to open a new issue if there are other problems that need addressing. Thanks!
I did the steps described here and it worked great.
The problem is when I add
istio
to the namespace.vault-agent-init
container can't correctly start because there's no network available yet.Is there a way to use just the
vault-agent
sidecar and not use thevault-agent-init
container? Any configuration that can be done to execute the command from thevault-agent-init
inside thevault-agent
sidecar?I found this comment in the container_init_sidecar.go code and I'm not sure if its safe to execute everything inside the sidecar container.