ricoberger / vault-secrets-operator

Create Kubernetes secrets from Vault for a secure GitOps based workflow.
MIT License
633 stars 103 forks source link

imagePullSecrets for GCR - gcp secrets engine #123

Open TJM opened 3 years ago

TJM commented 3 years ago

Hi all,

I was looking at #54 to create imagePullSecrets, and that looks like it might work, but the secret that I am trying to access is not a "kv" type. The credential comes from the gcp secrets engine. So, as my goal is to get a secret (imagePullSecrets) to access GCR, would it be better to try to hack at this code to use the GCP secrets engine in Vault, or to hack at something else like the vault agent to create a kubernetes secret?

TJM commented 3 years ago

So far, I am running vault agent, and dumping the gitlab-runner-sa.json (key.json) file out... then use the following command to "create" the secret... the .dockerconfigjson is a PITA ... embedding that entire mess of json into the "password" field seems really wrong to me.

template:

{{ with secret "path/to/gcp/key/gitlab-runner-sa" }}
{{ base64Decode .Data.private_key_data }}
{{ end }}

relevant part of agent config template:

  command = "bash -c 'EMAIL=$(jq -r .client_email gitlab-runner-sa.json); kubectl create secret docker-registry gcr-secret --docker-server gcr.io --docker-username _json_key --docker-email \"$EMAIL\" --docker-password \"$(cat gitlab-runner-sa.json)\" --dry-run -o yaml | kubectl apply --namespace gitlab-runner -f -'"
ricoberger commented 3 years ago

Hi @TJM, I think adding support for the Google Cloud Secrets Engine would be the best way to support your use case. Unfortunately I do not have that much experience with GCP and Vault.

If you or someone else wants to add support for the Google Cloud Secrets Engine, I think it could be done in a similar way like it was done for Azure: https://github.com/ricoberger/vault-secrets-operator/pull/114

TJM commented 3 years ago

That PR looks like an authentication mechanism, rather than a secrets engine. The GCP Secrets Engine would replace KV or KVV2, which hold manual/static secrets, with dynamic secrets (like AWS it creates the service account and returns the credentials). I have a vault agent based solution (hack?) working, but I would consider it to be very alpha. https://github.com/TJM/vault-gcr-secrets

I think it would be better to tie it into VSO at some point. I just need to wrap my head around the secrets engine interfaces and see how hard it would be to add GCP. :)

~tommy

ricoberger commented 3 years ago

Ah ok, sorry for misunderstanding, hopefully I got it now:

Did I get it right now?

If this is correct I think we can support this. We already have a secretEngine field in the CRD, which can be set to gcp.

If this is the case we can check for the secretEngine in the GetSecret function and try to get the secret in a similar way as for the KV engine.

Does this makes sense?

TJM commented 3 years ago

Yes, sir! This sounds right.

To make things a little more complicated, we would have to do some trickery with the data returned because of the way that kubernetes formats the "docker-registry" type certificates, or rather the fact that Google wants the entire key.json as the docker-password option (JSON in JSON). I think we can probably handle it, but the "escaping" of all the quotes in the json strong had me stumped, so I just used kubectl create secret .... --dry-run=client -o yaml | kubectl apply -f to get the secret added/updated.

~tommy

ricoberger commented 3 years ago

Can you maybe share the output of the following command vault read -format=json gcp/roleset/my-key-roleset/key without any confidential data?

Edit: Maybe also the output of vault read -output-curl-string gcp/roleset/my-key-roleset/key can be helpful.

TJM commented 3 years ago

@ricoberger The output is pretty boring... I can tell you that the part that we care about is base64 encoded in the json key private_key_data.

We use the vault template:

{{ with secret "path/to/gcp/key/gitlab-runner-sa" }}
{{ base64Decode .Data.private_key_data }}
{{ end }}

Here is the censored version:

{
  "request_id": "2b0f5057-411d-aa4f-8df7-8b6b91b849de",
  "lease_id": "path/to/gcp/key/gitlab-runner-sa/(CENSORED-SOME-ID)",
  "lease_duration": 2592000,
  "renewable": true,
  "data": {
    "key_algorithm": "KEY_ALG_RSA_2048",
    "key_type": "TYPE_GOOGLE_CREDENTIALS_FILE",
    "private_key_data": "(CENSORED-BASE64-ENCODED-keyfile)="
  },
  "warnings": null
}

the last one is pretty straightforward too, as it doesn't really connect to vault...

[tmcneely@den3l1cliqa74077 vault-agent]$ vault read -output-curl-string gcp/roleset/my-key-roleset/key
curl -H "X-Vault-Request: true" https://127.0.0.1:8200/v1/gcp/roleset/my-key-roleset/key
ricoberger commented 3 years ago

Hi @TJM, I tried to implement support for GCP.

I would need some help to verify that it is working. Can you maybe take a look at #130 and test the Docker image with the dev tag?

This would be very helpful.

TJM commented 3 years ago

@ricoberger THANKS! ... I saw this yesterday, but was in meetings all day ;( ... I need to get some project work completed today, but will jump on testing this next week. Sorry for the delay

TJM commented 3 years ago

Looks like this was released a couple days ago? I will hopefully get a chance to try it out :) I am running into an issue finding an environment that I could put a development version of VSO... now that it is released, maybe I can talk them into it :-/

ricoberger commented 3 years ago

Hi, this is only available via the dev tag yet. I would like to verify that it is working like expected before an official release.

TJM commented 2 years ago

This is looking OK, so far...

NOTE: We already had the kubernetes auth setup...

Created the prerequisites: vault_gcp_secret_roleset, kubernetes_service_account, vault_kubernetes_auth_backend_role, vault_policy ...

Then the helm chart:

resource "helm_release" "a0000_vault_secrets_operator" {
  name       = "vault-secrets-operator"
  namespace  = kubernetes_namespace.a0000_anthos_namespace.metadata.0.name
  repository = "https://ricoberger.github.io/helm-charts"
  chart      = "vault-secrets-operator"
  values = [
    templatefile("${path.module}/files/vault_secrets_operator.yaml.tpl", {
      KUBERNETES_PATH = "auth/${vault_auth_backend.kubernetes.path}"
      KUBERNETES_ROLE = vault_kubernetes_auth_backend_role.a0000_vault_secrets_operator.role_name
      KUBERNETES_SA   = kubernetes_service_account.a0000_vault_secrets_operator.metadata.0.name
      RBAC_NAMESPACED = false # Limit VSO to its own namespace
      NAMESPACES      = [kubernetes_namespace.a0000_anthos_namespace.metadata.0.name]
    })
  ]

  depends_on = [
    vault_kubernetes_auth_backend_role.a0000_vault_secrets_operator,
    vault_policy.a0000,
    kubernetes_service_account.a0000_vault_secrets_operator,
  ]
  // FIXME: Create VaultSecrets for each of the three service account keys

}

with the following yaml tmpl mentioned above:

### WARNING! Changing this file changes *all* VaultSecretOperators
### Use an instance variable whenever possible
image:
  repository: docker-remote.artifactory.company.com/ricoberger/vault-secrets-operator
  tag: dev
resources:
  limits:
    cpu: 100m
    memory: 128Mi
  requests:
    cpu: 100m
    memory: 128Mi
vault:
  address: https://vault.company.com
  authMethod: kubernetes
  kubernetesPath: ${KUBERNETES_PATH}
  kubernetesRole: ${KUBERNETES_ROLE}
  namespaces: ${join(",", NAMESPACES)}
rbac:
  namespaced: ${RBAC_NAMESPACED}
serviceAccount:
  create: false
  name: ${KUBERNETES_SA}

Then, the secret looks like:

apiVersion: ricoberger.de/v1alpha1
kind: VaultSecret
metadata:
  name: a0000-bq-secret
spec:
  isBinary: true
  keys:
    - private_key_data
  path: ai/np/gcp/key/a0000-bq
  secretEngine: gcp
  type: Opaque

... which I had to apply manually because apparently raw yaml (CRDs) are not well liked by terraform... I have created a helm chart in the past to get around this, still thinking on how to manage these in terraform.

Logs:

{"level":"info","ts":1638403415.9145195,"logger":"vault","msg":"Reconciliation is enabled.","ReconciliationTime":0}
I1202 00:03:37.569527       1 request.go:665] Waited for 1.04742487s due to client-side throttling, not priority and fairness, request: GET:https://10.202.208.1:443/apis/node.k8s.io/v1beta1?timeout=32s
{"level":"info","ts":1638403419.1731763,"logger":"controller-runtime.metrics","msg":"metrics server is starting to listen","addr":":8080"}
{"level":"info","ts":1638403419.1738043,"logger":"setup","msg":"starting manager"}
I1202 00:03:39.174152       1 leaderelection.go:248] attempting to acquire leader lease a0000-ro-feat-v-8430/vaultsecretsoperator.ricoberger.de...
{"level":"info","ts":1638403419.1742282,"msg":"starting metrics server","path":"/metrics"}
I1202 00:04:02.307290       1 leaderelection.go:258] successfully acquired lease a0000-ro-feat-v-8430/vaultsecretsoperator.ricoberger.de
{"level":"info","ts":1638403442.3074925,"logger":"controller.vaultsecret","msg":"Starting EventSource","reconciler group":"ricoberger.de","reconciler kind":"VaultSecret","source":"kind source: /, Kind="}
{"level":"info","ts":1638403442.307584,"logger":"controller.vaultsecret","msg":"Starting EventSource","reconciler group":"ricoberger.de","reconciler kind":"VaultSecret","source":"kind source: /, Kind="}
{"level":"info","ts":1638403442.3075905,"logger":"controller.vaultsecret","msg":"Starting Controller","reconciler group":"ricoberger.de","reconciler kind":"VaultSecret"}
{"level":"info","ts":1638403442.4085374,"logger":"controller.vaultsecret","msg":"Starting workers","reconciler group":"ricoberger.de","reconciler kind":"VaultSecret","worker count":1}
{"level":"info","ts":1638403442.4087484,"logger":"controllers.VaultSecret","msg":"Use shared client to get secret from Vault","vaultsecret":"a0000-ro-feat-v-8430/a0000-bq-secret"}
{"level":"info","ts":1638403442.4087753,"logger":"vault","msg":"Read secret ai/np/gcp/key/a0000-bq"}
{"level":"info","ts":1638403442.798935,"logger":"controllers.VaultSecret","msg":"Updating a Secret","vaultsecret":"a0000-ro-feat-v-8430/a0000-bq-secret","Secret.Namespace":"a0000-ro-feat-v-8430","Secret.Name":"a0000-bq-secret"}
{"level":"info","ts":1638403442.8157718,"logger":"controllers.VaultSecret","msg":"Use shared client to get secret from Vault","vaultsecret":"a0000-ro-feat-v-8430/a0000-bq-secret"}
{"level":"info","ts":1638403442.8158085,"logger":"vault","msg":"Read secret ai/np/gcp/key/a0000-bq"}
{"level":"info","ts":1638403443.4009356,"logger":"controllers.VaultSecret","msg":"Updating a Secret","vaultsecret":"a0000-ro-feat-v-8430/a0000-bq-secret","Secret.Namespace":"a0000-ro-feat-v-8430","Secret.Name":"a0000-bq-secret"}

... though I am a little concerned that it appeared to update the secret twice. Every time the secret is "read" a new key is created, max 10 ... so it can't "read" the secret to reconcile, it needs to pay attention to the lease time. I have not had a chance to look at the code yet to see if it does that...

AND NOW, other than the secret having the "wrong" key name (they were expecting a0000-bq.json, but it's using the private_key_data (as expected), the value of the secret is "good" !!! :)

TJM commented 2 years ago

One thing we found is that it is a little too aggressive on retries... I messed up the path and it looked like it tried ~45 times in just a few minutes. I can send the logs if you want, but I think it needs to back off when it gets permission denied (403) - I dont think this is "gcp" specific?

TJM commented 2 years ago

FYI: We are using https://github.com/TJM/vault-secret-helmchart to create our VaultSecret objects through Terraform ;)

ricoberger commented 2 years ago

Hi @TJM, thanks for testing.

  1. Yes the secret will be read twice, when it doesn't already exists. Thats the current behavior of the operator. I will check if we can somehow omit the second reconciliation after the secret was created.
  2. If I get you right, the returned secret for GCP has a lease time? The operator currently only supports a global setting for this, but not on a per secret basis. Does this already help?
  3. For the expected key name, maybe this can be changed to the expected value by the user via https://github.com/ricoberger/vault-secrets-operator#using-templated-secrets
  4. After an error the operator tries to reconcile the secret immediately. I'm not sure if we can change this behavior. Maybe we can save the number of failed reconciliations in the status field and use this for an exponential backoff logic.
TJM commented 2 years ago

Hi @TJM, thanks for testing.

  1. Yes the secret will be read twice, when it doesn't already exists. Thats the current behavior of the operator. I will check if we can somehow omit the second reconciliation after the secret was created.

That would be best if it was only "read" once, since each time it is "read" it creates a new key (lease).

  1. If I get you right, the returned secret for GCP has a lease time? The operator currently only supports a global setting for this, but not on a per secret basis. Does this already help?

Oh, Vault leases :)

I think with both types of leases, you can "request" a certain length, but vault policies may limit that.

  1. For the expected key name, maybe this can be changed to the expected value by the user via https://github.com/ricoberger/vault-secrets-operator#using-templated-secrets

Yep, I am keeping templating in my back pocket, in case there is push back on using the name that vault/google call the secret. I think it should be ok, or even "better" to use a "common" name like private_key_data. Time will tell :)

  1. After an error the operator tries to reconcile the secret immediately. I'm not sure if we can change this behavior. Maybe we can save the number of failed reconciliations in the status field and use this for an exponential backoff logic.

Some sort of backoff does seem like a good idea, to reduce the impactful load against Vault. Or I could stop making mistakes? Nope. That doesn't sound like it will happen. :)