hashicorp / vault-k8s

First-class support for Vault and Kubernetes.
Mozilla Public License 2.0
788 stars 168 forks source link

vault-k8s and istio service mesh don't work together #41

Closed gabrielanavarro closed 1 year ago

gabrielanavarro commented 4 years ago

I did the steps described here and it worked great.

The problem is when I add istio to the namespace. vault-agent-init container can't correctly start because there's no network available yet.

Is there a way to use just the vault-agent sidecar and not use the vault-agent-init container? Any configuration that can be done to execute the command from the vault-agent-init inside the vault-agent sidecar?

I found this comment in the container_init_sidecar.go code and I'm not sure if its safe to execute everything inside the sidecar container.

jasonodonnell commented 4 years ago

@gabrielanavarro You can disable the init container by setting the following annotation:

vault.hashicorp.com/agent-pre-populate: false

At the moment Istio isn't supported but we're looking into how to make this work.

gabrielanavarro commented 4 years ago

@jasonodonnell Thank you so much. This worked great!

And is there any security issues doing this way?

AlexisMach commented 4 years ago

Seems that istio init container is patching the pod's iptable before vault-agent-init is executed. Since istio-proxy isn't running, vault-agent-init cannot access the vault server. Switching init containers order in the pod's yaml file (i.e. putting vault-agent-init first instead of istio-init) do the trick, but i don't know if it's possible to do this directly with the mutatingwebhook.

dippynark commented 4 years ago

You can also omit the outbound port to Vault from Envoy redirection using the follow annotation: traffic.sidecar.istio.io/excludeOutboundPorts: "8200"

Of course this only works if you're happy with connections to Vault from the init container and sidecar not being intercepted - have only tested this with Istio CNI.

tvoran commented 4 years ago

@gabrielanavarro I'm not aware of any security issues caused by disabling the init container. The main issue will be whether your application expects the secrets from vault to be already rendered when your application starts up. The sidecar container will render the secrets, but it's a race between it rendering them and your application consuming them.

But to your initial problem, vault-k8s 0.3.0 now has support for a vault.hashicorp.com/agent-init-first annotation. Setting that to true should allow the vault init container to run before the istio init container so they both have a better chance of succeeding. We'd love to hear if this works for you!

AlexisMach commented 4 years ago

It's working perfectly

TamasNeumer commented 4 years ago

For me it solved the problem of the vault-agent-init container being initialized later than the istio-proxy.

However as I was playing around with Vault/Istio I came across a bug, where the pod's yaml was not populated by the vault-agent-init, nor by the vault-agent sidecar. I believe there is a race condition between istio-sidecar-injector and the vault-agent-injector-cfg MutatingWebhookConfigurations. This is why I probably end ended with pods that have three containers (application, istio-proxy, vault-agent) and pods that had only two (application, istio-proxy).

I have managed to solve the issue, when I merged the two configs into a single MutatingWebhookConfiguration. However I'm definitely sure that such a Hack is not a long-term solution. In this case I always (correctly) ended up with 3 containers (app + 2 sidecars)

apiVersion: admissionregistration.k8s.io/v1beta1
kind: MutatingWebhookConfiguration
metadata:
  labels:
    app.kubernetes.io/instance: vault
    app.kubernetes.io/managed-by: Tiller
    app.kubernetes.io/name: vault-agent-injector
    app: sidecar-injector
    operator.istio.io/component: Pilot
    operator.istio.io/managed: Reconcile
    operator.istio.io/version: 1.5.0
    release: istio
  name: tamas-injector
webhooks:
  - admissionReviewVersions:
      - v1beta1
    clientConfig:
      caBundle: abc
      service:
        name: vault-agent-injector-svc
        namespace: vault
        path: /mutate
        port: 443
    failurePolicy: Ignore
    matchPolicy: Exact
    name: vault.hashicorp.com
    namespaceSelector: {}
    objectSelector: {}
    reinvocationPolicy: IfNeeded
    rules:
      - apiGroups:
          - ""
        apiVersions:
          - v1
        operations:
          - CREATE
          - UPDATE
        resources:
          - pods
        scope: "*"
    sideEffects: Unknown
    timeoutSeconds: 30
  - admissionReviewVersions:
      - v1beta1
    clientConfig:
      caBundle: xyz
      service:
        name: istiod
        namespace: istio-system
        path: /inject
        port: 443
    failurePolicy: Fail
    matchPolicy: Exact
    name: sidecar-injector.istio.io
    namespaceSelector:
      matchLabels:
        istio-injection: enabled
    objectSelector: {}
    reinvocationPolicy: Never
    rules:
      - apiGroups:
          - ""
        apiVersions:
          - v1
        operations:
          - CREATE
        resources:
          - pods
        scope: "*"
    sideEffects: Unknown
    timeoutSeconds: 30
AlexisMach commented 4 years ago

Could also be a namespaceSelector issue or a connection lost between the k8s master and its node (so it fails to mutate the pod against the webhook pod). Have you look after these ?

TamasNeumer commented 4 years ago

Hi! Thanks for the tips.

Connection: We are running on Kubernetes on GKE, and I believe it's highly unlikely that the connection would be lost between he Master/Nodes that often. (If that were, the connection would be lost approx. 5-7 times out of 10, which I doubt. Also once merged the two configs together it works 100% of the time, so if it were a connectivity issue I should still be seeing missing sidecars/init containers.

namespaceSelector: I'm deploying to the default namespace which has the label istio-injection: enabled, so theoretically both istio and vault injectors should pick up the config. And here again: once I have merged the two configs there were no issues, thus I believe it's not the namespaceSelector.

Unfortunately GKE only allows running K8s up to 1.59, and Kubernetes supports monitoring of Webhooks only from 1.6+ (Link)

AlexisMach commented 4 years ago

Did you make sure your namespaces were labeled both for istio and vault ? For instance, my vault webhook configuration has :

namespaceSelector:
  matchLabels:
    vault-webhook: enabled

So my namespaces are labeled with istio-injection: enabled and vault-webhook: enabled

dippynark commented 4 years ago

@TamasNeumer you can run later versions of k8s on GKE using the rapid channel:

Not too sure why you're seeing what you are, if the init container and sidecar aren't appearing at all it seems the webhook either isn't being hit at all or it is returning an error - you could try changing to failurePolicy: Fail to see if calls to your webhook are in fact failing (I think errors would show as events from managing resources, e.g. by describing the managing Replicaset if you're using Deployments).

Some things that might make a difference, but that seem unlikely to be the cause given the randomness you're seeing:

I think the main fix for Istio support is to properly support reinvocationPolicy: IfNeeded for the vault injector webhook. Looking briefly through the code it seems that vault-k8s is adding an annotation (vault.hashicorp.com/agent-inject-status) to signify injection and then bailing if the webhook is called again with that annotation set to achieve idempotency. This means though that if a later webhook (e.g. one with a corresponding mutatingwebhookconfiguration that starts with z) happens to inject an init container first then vault.hashicorp.com/agent-init-first won't behave as expected.

Instead, the webhook's patching logic should be made idempotent which would allow the webhook to be called again and to reorder the init containers again if needed.

TamasNeumer commented 4 years ago

Hi!

Thanks for the lengthy comment. I came to the conclusion that probably I had an older version of istio on my cluster (1.4.0 or maybe even 1.4.0 beta.)

I have tested the following setups:

I believe it was an issue with the older version of istio. Thank you for the support!

TamasNeumer commented 4 years ago

You can also omit the outbound port to Vault from Envoy redirection using the follow annotation: traffic.sidecar.istio.io/excludeOutboundPorts: "8200"

Of course this only works if you're happy with connections to Vault from the init container and sidecar not being intercepted - have only tested this with Istio CNI.

This solution works for me, however I was wondering if I could achieve the same with an istio ServiceEntry rather than this annotation. (The later being the "preferred" way of solving this problem.)

I was trying to get it work based on the documentation, however to me it seems that despite the ServiceEntry, the sidecar can't resolve the host. (vault.vault.svc)

" backoff=1.5269282149999999 2020-03-27T10:17:14.180Z [INFO] auth.handler: authenticating 2020-03-27T10:17:14.184Z [ERROR] auth.handler: error authenticating: error="Error making API request.


- ServiceEntry:

```yaml
apiVersion: networking.istio.io/v1alpha3
kind: ServiceEntry
metadata:
  name: vault-service-entry
spec:
  hosts:
    - vault.vault.svc
  ports:
    - number: 8200
      name: http
      protocol: HTTP
  location: MESH_EXTERNAL
  resolution: DNS

Also I have added a busybox sidecar, and from inside I wanted to run an nslookup vault.vault.svc, which (If I understand correctly) tries to go to my kube-dns (on 10.0.0.10)

/ # nslookup vault
Server:         10.0.0.10
Address:        10.0.0.10:53

** server can't find vault.default.svc.cluster.local: NXDOMAIN

*** Can't find vault.svc.cluster.local: No answer
*** Can't find vault.cluster.local: No answer
*** Can't find vault.c.retail-platform-sandbox-hsord.internal: No answer
*** Can't find vault.google.internal: No answer
*** Can't find vault.default.svc.cluster.local: No answer
*** Can't find vault.svc.cluster.local: No answer
*** Can't find vault.cluster.local: No answer
*** Can't find vault.c.retail-platform-sandbox-hsord.internal: No answer
*** Can't find vault.google.internal: No answer

Info: vault is running in vault namespace, on the same cluster. The deployment I have been playing around was in the default service

dippynark commented 4 years ago

@TamasNeumer you need to specify the FQDN for the vault service in the ServiceEntry: vault.vault.svc.cluster.local

Additionally, for the busybox sidecar the command should be nslookup vault.vault

The ServiceEntry works fine once the Envoy sidecar is running, the purpose of excluding outbound redirection was so connections could be made to Vault before the Envoy sidecar is running but after redirection happens (i.e. when trying to connect from an init container that runs after istio's init container) - it shouldn't be necessary anymore though with recent changes to vault-k8s

TamasNeumer commented 4 years ago

@TamasNeumer you need to specify the FQDN for the vault service in the ServiceEntry: vault.vault.svc.cluster.local

Additionally, for the busybox sidecar the command should be nslookup vault.vault

The ServiceEntry works fine once the Envoy sidecar is running, the purpose of excluding outbound redirection was so connections could be made to Vault before the Envoy sidecar is running but after redirection happens (i.e. when trying to connect from an init container that runs after istio's init container) - it shouldn't be necessary anymore though with recent changes to vault-k8s

Amazing! I have managed it to work!

In conclusion I have used only two annotations on the deployment itself:

yamlvault.hashicorp.com/agent-init-first: "true"
vault.hashicorp.com/agent-inject: "true"

And as you have said, the ServiceEntry needed a quick fix, but works:

apiVersion: networking.istio.io/v1alpha3
kind: ServiceEntry
metadata:
  name: vault-service-entry
spec:
  hosts:
    - vault.vault.svc.cluster.local
  ports:
    - number: 8200
      name: http
      protocol: HTTP
  location: MESH_EXTERNAL
  resolution: DNS
dippynark commented 4 years ago

@TamasNeumer awesome - there's no way of defining default annotations on a Namespace level atm: https://github.com/kubernetes/kubernetes/issues/35504

You would probably need to write your own mutating admission webhook or use an existing one to achieve this - something OPA based should be enough: https://github.com/open-policy-agent/gatekeeper

skitnica commented 4 years ago

Hi

I am busy with implementing vault on kubernetes too, and adding the agent-init-first annotation worked fine to get it all working.. with istio... except for kafka and rabbitmq !! It was working untill I enabled istio. I am getting the following error in the vault-agent-init container: 2020-04-03T14:43:06.806Z [INFO] auth.handler: authenticating 2020-04-03T14:43:06.811Z [ERROR] auth.handler: error authenticating: error="Put http://vault.polystream-hub.svc:8200/v1/auth/kubernetes/login: dial tcp 10.0.20.113:8200: connect: connection refused" backoff=2.7553159369999998

My vault, kafka and rabbitmq are all in the same namespace...

This is only happening in the kafka and rabbitmq pods...

Any ideas?

davidcunningham commented 4 years ago

Has anyone else seen the issue referenced by @TamasNeumer where some pods end up with three containers (application, istio-proxy, vault-agent) and some pods only two (application, istio-proxy)? This happens every time I start my cluster, but if I patch a pod the vault-agent will spin up and write out the secrets just fine.

Running k8s 1.15 on AWS EKS Istio 1.6 Vault k8s 0.3 Skaffold 1.5

Annotations: vault.hashicorp.com/agent-inject: "true" vault.hashicorp.com/agent-init-first: "true"

I also tried locally on docker-deskop k8s v1.16.5 and the same thing happens.

davidcunningham commented 4 years ago

After debugging some more and reviewing the API logs I found this: failed calling webhook "vault.hashicorp.com": Post https://vault-agent-injector-svc.default.svc:443/mutate?timeout=30s: no endpoints available for service "vault-agent-injector-svc"

However - I see this as a red herring since as I mentioned above if I patch a deployment everything works as expected. It would seem to me that the istio/envoy network and vault agent eventually get to a point that it can be injected.

Thoughts?

dippynark commented 4 years ago

@davidcunningham I think it's just the webhook is configured to fail open, so when you first start up your cluster the webhook is taking longer than it takes before some pods are created.

If you configure the webhook to reject requests if it fails then controllers should retry.

davidcunningham commented 4 years ago

@dippynark - thanks! Adding failurePolicy: Fail to the webhook seems to have done it! Will continue to test scenarios.

a8j8i8t8 commented 4 years ago

@TamasNeumer In your setup, is Istio also enabled for vault and vault-agent-injector? I'm running into same issue currently, and I've Istio enabled with mTLS for everything.

codetap-developer commented 4 years ago

@TamasNeumer @a8j8i8t8 , we are having similar issue.

When we enable istio in vault name space (where vault injector deployed, external vault) we are getting errors when application pod deployed in application name space, but in case if istio disabled everything working fine (The application name space always has istio, )

murbano83 commented 4 years ago

Hello! I have similar Issue.

I'm trying to get vault-agent-injector-svc working but I'm stuck. All pods when are created are without the sidecar from vault. If I set failurePolicy: Fail then the replicaset events:

Internal error occurred: failed calling webhook "vault.hashicorp.com": Post https://vault-agent-injector-svc.vault.svc:443/mutate?timeout=30s: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Internal error occurred: failed calling webhook "vault.hashicorp.com": Post https://vault-agent-injector-svc.vault.svc:443/mutate?timeout=30s: dial tcp 10.8.2.36:8080: i/o timeout
Error creating: Internal error occurred: failed calling webhook "vault.hashicorp.com": Post https://vault-agent-injector-svc.vault.svc:443/mutate?timeout=30s: context deadline exceeded

So, I have two service entries:

apiVersion: networking.istio.io/v1alpha3
kind: ServiceEntry
metadata:
  name: vault-service-entry
spec:
  hosts:
    - vault.vault.svc.cluster.local
  ports:
    - number: 8200
      name: http
      protocol: HTTP
  location: MESH_EXTERNAL
  resolution: DNS
apiVersion: networking.istio.io/v1alpha3
kind: ServiceEntry
metadata:
  name: vault-injector-service-entry
spec:
  hosts:
    - vault-agent-injector-svc.vault.svc.cluster.local
  ports:
    - number: 8080
      name: http
      protocol: HTTP
    - number: 443
      name: tcp
      protocol: TCP
  location: MESH_EXTERNAL
  resolution: DNS

With the second one, I started receiving requests on vault-agent-injector-svc but with errors:

2020-10-14T06:32:05.981Z [INFO]  
handler: Starting handler..
 Listening on ":8080"...
 Updated certificate bundle received. Updating certs...
 2020/10/14 06:32:15 http: TLS handshake error from 10.8.1.29:60748: remote error: tls: unknown certificate authority
 2020/10/14 06:32:15 http: TLS handshake error from 10.8.0.19:39344: remote error: tls: unknown certificate authority
 2020/10/14 06:32:15 http: TLS handshake error from 10.8.5.109:33832: remote error: tls: unknown certificate authority
 2020/10/14 06:32:16 http: TLS handshake error from 10.8.4.92:40274: remote error: tls: unknown certificate authority
 2020/10/14 06:32:16 http: TLS handshake error from 10.8.4.92:40278: remote error: tls: unknown certificate authority
 2020/10/14 06:32:17 http: TLS handshake error from 10.8.1.29:60840: remote error: tls: unknown certificate authority

And the replicaset events its giving that one:

replicaset-controller  Error creating: Internal error occurred: failed calling webhook "vault.hashicorp.com": Post https://vault-agent-injector-svc.vault.svc:443/mutate?timeout=30s: context deadline exceeded

I though the cert was wrong, but the mutatingwebhookconfiguration has the correct caBundle and seems fine.

I also check with nslookup if there are connection:

$ nslookup vault.vault.svc
Server:     10.0.32.10
Address:    10.0.32.10#53

Non-authoritative answer:
Name:   vault.vault.svc.cluster.local
Address: 10.0.34.33
$ nslookup vault-agent-injector-svc.vault.svc
Server:     10.0.32.10
Address:    10.0.32.10#53

Non-authoritative answer:
Name:   vault-agent-injector-svc.vault.svc.cluster.local
Address: 10.0.32.142

Not sure what to do next.

This are my vault config:

injector:
  logLevel: "trace"
  certs:
    secretName: "injector-tls"
    caBundle: "xxx"

server:
  extraEnvironmentVars:
    GOOGLE_REGION: xxxx
    GOOGLE_PROJECT: xxxx
    GOOGLE_APPLICATION_CREDENTIALS: /vault/userconfig/kms-creds/credentials.json

  extraVolumes:
    - type: 'secret'
      name: 'kms-creds'

  service:
    enabled: true

  ha:
    enabled: true
    replicas: 2

    config: |
      ui = true

      listener "tcp" {
        tls_disable = 1
        address = "[::]:8200"
        cluster_address = "[::]:8201"
      }

      seal "gcpckms" {
        project     = "xxxx"
        region      = "xxxx"
        key_ring    = "xxxx"
        crypto_key  = "xxxx"
      }

      storage "postgresql" {
        connection_url = "xxxx"
        ha_enabled = "true"
        max_parallel = "128"
      }

      log_level = "trace"
semihural commented 3 years ago

Hi All, I'm stuck with the same issue. I am using on-prem k8s v1.19 and Istio with 1.8.0.. I got stuck to run them together properly when I inject istio mesh to the hub-dev where our microservices are running. Vault is running dev namespace.

The first issue I had is Vault and Istio sidecar is not running properly somehow and the application can not able to init as below. I tried to use below annotations to init first vault but it did not solve below issue.

Here are the outputs of pod status and describe

$ kubectl get pods -n hub-dev
    NAME                                     READY   STATUS     RESTARTS   AGE
    oneapihub-mp-dev-59f7685455-5kmft        0/3     Init:0/2   0          19

$ kubectl describe pod oneapihub-mp-dev-59f7685455-5kmft -n hub-dev

Init Containers:
  vault-agent-init:
    Container ID:  
    State:          Running
      Started:      Fri, 15 Jan 2021 13:54:30 +0300
    Ready:          False
  istio-validation:
    Container ID:
    Image:         reg-dhc.app.corpintra.net/i3-mirror/docker.io_istio_proxyv2:1.8.0
    State:          Waiting
     Reason:       PodInitializing
    Ready:          False
Containers:
      oneapihub-mp:
        Container ID:
        State:          Waiting
          Reason:       PodInitializing
        Ready:          False
      istio-proxy:
        Container ID:
        State:          Waiting
          Reason:       PodInitializing
        Ready:          False
  istio-proxy:
    Container ID:
    State:          Waiting
      Reason:       PodInitializing
    Ready:          False

    Normal  Pulled     16m   kubelet, xx-kube-node07  Container image "docker.io_vault:1.5.2" already present on machine
    Normal  Created    16m   kubelet, xx-kube-node07  Created container vault-agent-init
    Normal  Started    16m   kubelet, xx-kube-node07  Started container vault-agent-init

When I tried below annotation, it fixed the above issue, but this time when pod starts to run it can not able to find /vault/secrets path but somehow after that it can be read when I checked the logs of proxy and application and /vault/secrets folder is exist inside pod.

 - vault.hashicorp.com/agent-pre-populate: "false"

Here the logs of app even if folder exists

$ kubectl get pods -n hub-dev
oneapihub-mp-dev-78449b8cf6-qbqhn        3/3     Running   0          9m31s

$ kubectl logs -f oneapihub-mp-dev-78449b8cf6-qbqhn -n hub-dev -c oneapihub-mp

> market-place@1.0.0 start:docker /usr/src/app
> node app.js

{"message""devMessage":"SECRET_READ_ERROR","data":"","exception":"ENOENT: no such file or directory, open '/vault/secrets/database'","stack":"Error: ENOENT: no such file or directory, open '/vault/secrets/database'->

/ $ cd /vault/secrets
/vault/secrets $ ls
database  jenkins
/vault/secrets $

Here I have some PUT error which might related with Vault itself but I am confuse how then Vault can inject the secrets.

 $ kubectl logs -f oneapihub-mp-dev-78449b8cf6-qbqhn -n hub-dev -c vault-agent

2021-01-15T11:21:13.477Z [ERROR] auth.handler: error authenticating: error="Put "http://vault.dev.svc:8200/v1/auth/kubernetes/login": dial tcp 10.254.30.115:8200: connect: connection refused" backoff=2.464775515
==> Vault agent started! Log data will stream in below:

==> Vault agent configuration:

                     Cgo: disabled
               Log Level: info
                 Version: Vault v1.5.2
             Version Sha: 685fdfa60d607bca069c09d2d52b6958a7a2febd

2021-01-15T11:21:15.942Z [INFO]  auth.handler: authenticating
2021-01-15T11:21:15.966Z [INFO]  auth.handler: authentication successful, sending token to sinks
2021-01-15T11:21:15.966Z [INFO]  sink.file: token written: path=/home/vault/.vault-token

And lastly when I checkted the istio-proxy logs I can see that GET or PUT request returns 200.

$ kubectl logs -f oneapihub-mp-dev-78449b8cf6-h8s8j -n hub-dev -c istio-proxy

021-01-15T11:35:04.352221Z  warning envoy filter    mTLS PERMISSIVE mode is used, connection can be either plaintext or TLS, and client cert can be omitted. Please consider to upgrade to mTLS STRICT mode for more secure configuration that only allows TLS connection with client cert. See https://istio.io/docs/tasks/security/mtls-migration/
[2021-01-15T11:35:05.557Z] "PUT /v1/auth/kubernetes/login HTTP/1.1" 200 - "-" 1294 717 8 8 "-" "Go-http-client/1.1" "a082698b-d1f7-4aa5-9db5-01d86d5093ef" "vault.dev.svc:8200" "10.6.24.55:8200" outbound|8200||vault.dev.svc.cluster.local 10.6.19.226:55974 10.254.30.115:8200 10.6.19.226:60478 - default
2021-01-15T11:35:05.724833Z info    Envoy proxy is ready
[2021-010.6.19.226:41888 - default
[2021-01-15T11:35:05.596Z] "GET /v1/secret/data/oneapihub-marketplace/database HTTP/1.1" 200 - "-" 0 400 0 0 "-" "Go-http-client/1.1" "d7d10c1f-c445-44d1-b0e3-bb9ae7bbc2f0" "vault.dev.svc:8200" "10.6.24.55:8200" outbound|8200||vault.dev.svc.cluster.local 10.6.19.226:55974 10.254.30.115:8200 10.6.19.226:41900 - default
[2021-01-15T11:35:05.591Z] "PUT /v1/auth/token/renew-self HTTP/1.1" 200 - "-" 15 717 8 8 "-" "Go-http-client/1.1" "56705e5c-c966-4bc8-8187-7ca5bb2b4abe" "vault.dev.svc:8200" "10.6.24.55:8200" outbound|8200||vault.dev.svc.cluster.local 10.6.19.226:37388 10.254.30.115:8200 10.6.19.226:41890 - default
[2021-01-15T11:35:05.602Z] "GET /v1/secret/data/oneapihub-marketplace/jenkins HTTP/1.1" 200 - "-" 0 284 0 0 "-" "Go-http-client/1.1" "1b6d8601-18df-4f32-8722-162aa785c476" "vault.dev.svc:8200" "10.6.24.55:8200" outbound|8200||vault.dev.svc.cluster.local 10.6.19.226:55974 10.254.30.115:8200 10.6.19.226:41902 - default
ankur512512 commented 3 years ago

@gabrielanavarro I'm not aware of any security issues caused by disabling the init container. The main issue will be whether your application expects the secrets from vault to be already rendered when your application starts up. The sidecar container will render the secrets, but it's a race between it rendering them and your application consuming them.

But to your initial problem, vault-k8s 0.3.0 now has support for a vault.hashicorp.com/agent-init-first annotation. Setting that to true should allow the vault init container to run before the istio init container so they both have a better chance of succeeding. We'd love to hear if this works for you!

Setting vault.hashicorp.com/agent-init-first: true works for me. Thanks!!

einret commented 3 years ago

same issue here. vault-agent-init starts before the istio-validation init container thanks to vault.hashicorp.com/agent-init-first annotation, however it never completes and stuck with "connect : connection refused"

tested with istio 1.9 with CNI plugin installed on k8s v1.18 @semihural do you success to find a solution ?

ps: traffic.sidecar.istio.io/excludeOutboundPorts is not a desirable option.

semihural commented 3 years ago

@einret

This worked for me.

  template:
    metadata:
      annotations:
        traffic.sidecar.istio.io/excludeOutboundPorts: "8200"
        vault.hashicorp.com/agent-init-first: "true"
        vault.hashicorp.com/agent-inject: "true"
wojtek-viirtue commented 3 years ago

I was just looking into Vault and how it would interface with our istio setup. Curious if anyone has tried the following and whether it would solve the aforementioned issues. Not sure if it's an init or main container issue but, I'm going to give it a try in a bit.

    metadata:
      annotations:
        proxy.istio.io/config: '{ "holdApplicationUntilProxyStarts": true }'
vinayan3 commented 2 years ago

I have a setup where Vault is in another cluster and we have a DNS entry that points to a load balancer. I've added a ServiceEntry and the following annotations:

    traffic.sidecar.istio.io/excludeOutboundPorts: "8200"
    vault.hashicorp.com/agent-inject: "true"
    vault.hashicorp.com/agent-pre-populate: "false"
    vault.hashicorp.com/log-level: debug

The agent will say:

2022-01-21T00:54:40.845Z [INFO]  auth.handler: starting auth handler
2022-01-21T00:54:40.845Z [INFO]  auth.handler: authenticating
2022-01-21T00:55:40.845Z [ERROR] auth.handler: error authenticating: error="context deadline exceeded" backoff=1s
2022-01-21T00:55:41.845Z [INFO]  auth.handler: authenticating
2022-01-21T00:56:41.845Z [ERROR] auth.handler: error authenticating: error="context deadline exceeded" backoff=1.53s
2022-01-21T00:56:43.376Z [INFO]  auth.handler: authenticating
2022-01-21T00:57:43.377Z [ERROR] auth.handler: error authenticating: error="context deadline exceeded" backoff=2.95s
2022-01-21T00:57:46.336Z [INFO]  auth.handler: authenticating

My hunch is something is being blocked because why else would authenticating timeout?

Anyone else hit this?

OneideLuizSchneider commented 2 years ago

@einret

This worked for me.

  template:
    metadata:
      annotations:
        traffic.sidecar.istio.io/excludeOutboundPorts: "8200"
        vault.hashicorp.com/agent-init-first: "true"
        vault.hashicorp.com/agent-inject: "true"

Thanks @semihural That worked for me as well

razvan-miron commented 2 years ago

I have the vault installed outside the service mesh in another namespace but inside the same cluster. I've added the service entry:

apiVersion: networking.istio.io/v1alpha3 kind: ServiceEntry metadata: name: vault-service-entry spec: hosts:

I also added the annotations on the pod that has the vault sidecar: traffic.sidecar.istio.io/excludeOutboundPorts: "8200" vault.hashicorp.com/agent-init-first: "true" vault.hashicorp.com/agent-inject: "true"

It still does not work. The init container starts and cannot access the vault api:

auth.handler: error authenticating: error="Put "https://vault-dev.vault.svc:8200/v1/auth/kubernetes/login": dial tcp: lookup vault-dev.vault.svc on 172.18.0.10:53: read udp 10.208.7.161:59314->172.18.0.10:53: read: connection refused" backoff=1.949307983

thechristschn commented 2 years ago

tl;dr: Set vault.hashicorp.com/agent-run-as-user: "1337" if using istio-cni.

There are two ways to use istio:

1. Init container

Istio injects an istio-init init container, which sets up the networking. In this case, the ordering of the init container is relevant, because init containers before istio-init can access the network without restrictions.

2. Istio-CNI

WIth istio-cni, istio will setup the network even before the first init container starts. Istio will still inject an istio-validation init container, but it will just validate if the network setup is correct. With istio-cni the order of the init container doesn't matter, because it tries to route all traffic through the istio-proxy-sidecar, which isn't yet started while init container are running.

There are at least three solutions to avoid this problem (the last one might surprise you):

Disable vault-agent-init

Only init containers are a problem, because the istio-proxy-sidecar isn't running at that point. It is possible to disable vault-agent-init and just use the vault-agent-sidecar. This has the drawback, that the application needs to wait for the vault-agent-sidecar to populate the secrets. This can be achieved with a sleep in the application entrypoint (or a more sophisticated approach).

metadata:
  annotations:
    vault.hashicorp.com/agent-pre-populate: "false"
    proxy.istio.io/config: |
      holdApplicationUntilProxyStarts: true

holdApplicationUntilProxyStarts helps, because all container including vault-agent will wait until the istio-proxy-sidecar is ready to process requests.

Istio-Annotations

It is possible to disable the routing through the istio sidecar for specific ports, for example port 8200.

metadata:
  annotations:
    traffic.sidecar.istio.io/excludeOutboundPorts: "8200"
    proxy.istio.io/config: |
      proxyMetadata:
        ISTIO_META_DNS_CAPTURE: "false"

DNS capture isn't enabled by default, but if activated in general, you need to disable it to make dns resolution in the init container work. This might be the problem for @razvan-miron, as the connection to "172.18.0.10:53" failed, becasue port 53 is DNS. But this will disable DNS capture for the whole pod, which isn't always desirable.

Vault-Injector-Settings

Istio has another solution the avoid routing traffic through the istio-proxy-sidecar. All traffic from uid 1337 is ignored by the istio iptables. The reason for this is, that the istio-proxy runs at that user. This is described here: https://istio.io/latest/docs/setup/additional-setup/cni/#compatibility-with-application-init-containers But if other container, like vault-agent-init, run as this user, their traffic will also be ignored.

The user of the vault container can be changed with the annotation on the pod:

metadata:
  annotations:
    vault.hashicorp.com/agent-run-as-user: "1337"

or with an environmant variable on the injector

env:
- name: AGENT_INJECT_RUN_AS_USER
  value: "1337"

or in the helm chart:

extraEnvironmentVars:
  AGENT_INJECT_RUN_AS_USER: 1337

See also https://www.vaultproject.io/docs/platform/k8s/injector/annotations#vault-hashicorp-com-agent-run-as-user

A small drawback might be, that also the traffic of the vault-agent-sidecar isn't routed through istio. But I don't think this is a problem in general, except you want istio metrics of the sidecar.

dlydiard commented 1 year ago

Thanks @thechristschn the solution

metadata:
  annotations:
    vault.hashicorp.com/agent-pre-populate: "false"
    proxy.istio.io/config: |
      holdApplicationUntilProxyStarts: true

This worked for us with RedHat OpenShift ServiceMesh, we did not have to use excludeOutboundPorts or vault.hashicorp.com/agent-init-first, the Vault sidecar container was able to resolve our internal vault DNS and read the secrets. We did not have to add any delays to the app pod (your mileage may vary).

heatherezell commented 1 year ago

With the updates and workarounds provided in this issue, I'm going to go ahead and close it for now. Please feel free to open a new issue if there are other problems that need addressing. Thanks!