hashicorp / vault

A tool for secrets management, encryption as a service, and privileged access management
https://www.vaultproject.io/
Other
31.19k stars 4.21k forks source link

Vault agent doesn’t respect the number of retries #15725

Closed njegosrailic closed 2 years ago

njegosrailic commented 2 years ago

Describe the bug

Looks like the Vault agent doesn’t respect the number of retries.

I also tried setting the VAULT_MAX_RETRIES ENV variable.

To Reproduce Steps to reproduce the behavior:

Please see the configuration in the environment section below.

Expected behavior

According to the docs, I expect that the Vault agent won’t exit on an error and that the number of retries will be three. This is what I’m getting:

2022-05-16T14:32:48.342Z [DEBUG] cache.apiproxy.client: performing request: method=GET url=https://vault.service.confiad1.consul.csnzoo.com:8200/v1/dev/kv/secrets/app1/secret-new
2022-05-16T14:32:48.346Z [WARN] (view) vault.read(dbtech/dev/db/spdb/static-creds/svc-nrailic-secrets-test2-app): no secret exists at dbtech/dev/db/spdb/static-creds/svc-nrailic-secrets-test2-app (retry attempt 7 after "16s")
2022-05-16T14:32:48.347Z [WARN] (view) vault.read(dev/kv/secrets/app1/secret-new): no secret exists at dev/kv/secrets/app1/secret-new (retry attempt 7 after "16s")

I tried to set the VAULT_MAX_RETRIES ENV variable but no luck.

There is an option to pass the Consul template retry config but the problem is that I don’t understand how to pass it from the Vault config.

This is the Consul template config for the Vault config shown below:

[DEBUG] (runner) final config: {"Consul":{"Address":"","Namespace":"","Auth":{"Enabled":false,"Username":"","Password":""},"Retry":{"Attempts":12,"Backoff":250000000,"MaxBackoff":60000000000,"Enabled":true},"SSL":{"CaCert":"","CaPath":"","Cert":"","Enabled":false,"Key":"","ServerName":"","Verify":true},"Token":"","TokenFile":"","Transport":{"CustomDialer":null,"DialKeepAlive":30000000000,"DialTimeout":30000000000,"DisableKeepAlives":false,"IdleConnTimeout":90000000000,"MaxIdleConns":100,"MaxIdleConnsPerHost":33,"TLSHandshakeTimeout":10000000000}},"Dedup":{"Enabled":false,"MaxStale":2000000000,"Prefix":"consul-template/dedup/","TTL":15000000000,"BlockQueryWaitTime":60000000000},"DefaultDelims":{"Left":null,"Right":null},"Exec":{"Command":[],"Enabled":false,"Env":{"Denylist":[],"Custom":[],"Pristine":false,"Allowlist":[]},"KillSignal":2,"KillTimeout":30000000000,"ReloadSignal":null,"Splay":0,"Timeout":0},"KillSignal":2,"LogLevel":"DEBUG","FileLog":{"LogFilePath":"","LogRotateBytes":0,"LogRotateDuration":86400000000000,"LogRotateMaxFiles":0},"MaxStale":2000000000,"PidFile":"","ReloadSignal":1,"Syslog":{"Enabled":false,"Facility":"LOCAL0","Name":"consul-template"},"Templates":[{"Backup":false,"Command":[],"CommandTimeout":30000000000,"Contents":"{{ with secret \"/dev/kv/secrets/app1/secret-new\" }}{{ .Data.data.value }}{{ end }}","CreateDestDirs":true,"Destination":"/secrets/secret-new","ErrMissingKey":false,"ErrFatal":true,"Exec":{"Command":[],"Enabled":false,"Env":{"Denylist":[],"Custom":[],"Pristine":false,"Allowlist":[]},"KillSignal":2,"KillTimeout":30000000000,"ReloadSignal":null,"Splay":0,"Timeout":30000000000},"Perms":0,"User":null,"Uid":null,"Group":null,"Gid":null,"Source":"","Wait":{"Enabled":false,"Min":0,"Max":0},"LeftDelim":"{{","RightDelim":"}}","FunctionDenylist":[],"SandboxPath":""}],"TemplateErrFatal":null,"Vault":{"Address":"http://127.0.0.1:8200","Enabled":true,"Namespace":"","RenewToken":false,"Retry":{"Attempts":12,"Backoff":250000000,"MaxBackoff":60000000000,"Enabled":true},"SSL":{"CaCert":"","CaPath":"","Cert":"","Enabled":false,"Key":"","ServerName":"","Verify":false},"Transport":{"CustomDialer":{},"DialKeepAlive":30000000000,"DialTimeout":30000000000,"DisableKeepAlives":false,"IdleConnTimeout":90000000000,"MaxIdleConns":100,"MaxIdleConnsPerHost":33,"TLSHandshakeTimeout":10000000000},"UnwrapToken":false,"DefaultLeaseDuration":300000000000,"LeaseRenewalThreshold":0.9},"Wait":{"Enabled":false,"Min":0,"Max":0},"Once":false,"ParseOnly":false,"BlockQueryWaitTime":60000000000}

I can tell that the config is being parsed properly, because if I pass two values in the retry stanza I got an error that only one is allowed.

Environment:

Vault agent configuration file(s):

{
  "auto_auth": {
    "method": {
      "type": "kubernetes",
      "mount_path": "auth/kube_cluster_c5",
      "config": {
        "role": "secrets-test-dev"
      }
    },
    "sink": [
      {
        "type": "file",
        "config": {
          "path": "/vault/vault-token"
        }
      }
    ]
  },
  "cache": {
    "use_auto_auth_token": "true"
  },
  "exit_after_auth": false,
  "pid_file": "/vault/vault.pid",
  "vault": {
    "address": "https://vault.service:8200",
    "retry": [
      {
        "num_retries": "3"
      }
    ]
  },
  "template_config": [
    {
      "error_on_missing_key": false,
      "exit_on_retry_failure": false
    }
  ],
  "template": [
    {
      "destination": "/secrets/secret-new",
      "contents": "{{ with secret \"/dev/kv/secrets/app1/secret-new\" }}{{ .Data.data.value }}{{ end }}",
      "left_delimiter": "{{",
      "right_delimiter": "}}"
    }
  ],
  "listener": [
    {
      "type": "tcp",
      "address": "127.0.0.1:8200",
      "tls_disable": true
    }
  ]
}

Additional context Add any other context about the problem here.

jasonodonnell commented 2 years ago

This may have been fixed by https://github.com/hashicorp/vault/pull/15204, available in Vault 1.11.0.

njegosrailic commented 2 years ago

Thank you. I'll test it and let you know.

njegosrailic commented 2 years ago

Hi @jasonodonnell,

Unless I'm missing something the issue still persist:

/ $ vault version
Vault v1.11.1 (0f634755745f4adf62ec0723a0b93d6dce5bc33e), built 2022-07-19T20:16:47Z

/ $ cat /vault/config.json | grep -A 6 vault\":
  "vault": {
    "address": "https://vault.service.intra:8200",
    "retry": [
      {
        "num_retries": "3"
      }
    ]
/ $ cat /vault/config.json | grep -A 5 template_config
  "template_config": [
    {
      "error_on_missing_key": false,
      "exit_on_retry_failure": false
    }
  ],
/ $

Logs:

2022-08-02T08:11:43.370Z [WARN] (view) vault.read(secrets/dev/test-secret2): no secret exists at secrets/dev/test-secret2 (retry attempt 5 after "4s")
2022-08-02T08:11:47.371Z [INFO]  cache: received request: method=GET path=/v1/secrets/dev/test-secret2
2022-08-02T08:11:47.371Z [INFO]  cache: received request: method=GET path=/v1/secrets/dev/test-secret
2022-08-02T08:11:47.371Z [INFO]  cache.apiproxy: forwarding request: method=GET path=/v1/secrets/dev/test-secret2
2022-08-02T08:11:47.371Z [INFO]  cache.apiproxy: forwarding request: method=GET path=/v1/secrets/dev/test-secret
2022-08-02T08:11:47.376Z [WARN] (view) vault.read(secrets/dev/test-secret2): no secret exists at secrets/dev/test-secret2 (retry attempt 6 after "8s")
2022-08-02T08:11:47.376Z [WARN] (view) vault.read(secrets/dev/test-secret): no secret exists at secrets/dev/test-secret (retry attempt 6 after "8s")
2022-08-02T08:11:55.377Z [INFO]  cache: received request: method=GET path=/v1/secrets/dev/test-secret2
2022-08-02T08:11:55.377Z [INFO]  cache: received request: method=GET path=/v1/secrets/dev/test-secret
2022-08-02T08:11:55.377Z [INFO]  cache.apiproxy: forwarding request: method=GET path=/v1/secrets/dev/test-secret2
2022-08-02T08:11:55.377Z [INFO]  cache.apiproxy: forwarding request: method=GET path=/v1/secrets/dev/test-secret
2022-08-02T08:11:55.383Z [WARN] (view) vault.read(secrets/dev/test-secret2): no secret exists at secrets/dev/test-secret2 (retry attempt 7 after "16s")
2022-08-02T08:11:55.383Z [WARN] (view) vault.read(secrets/dev/test-secret): no secret exists at secrets/dev/test-secret (retry attempt 7 after "16s")
2022-08-02T08:12:11.384Z [INFO]  cache: received request: method=GET path=/v1/secrets/dev/test-secret
2022-08-02T08:12:11.384Z [INFO]  cache: received request: method=GET path=/v1/secrets/dev/test-secret2
2022-08-02T08:12:11.384Z [INFO]  cache.apiproxy: forwarding request: method=GET path=/v1/secrets/dev/test-secret
2022-08-02T08:12:11.384Z [INFO]  cache.apiproxy: forwarding request: method=GET path=/v1/secrets/dev/test-secret2
2022-08-02T08:12:11.390Z [WARN] (view) vault.read(secrets/dev/test-secret): no secret exists at secrets/dev/test-secret (retry attempt 8 after "32s")
2022-08-02T08:12:11.390Z [WARN] (view) vault.read(secrets/dev/test-secret2): no secret exists at secrets/dev/test-secret2 (retry attempt 8 after "32s")
2022-08-02T08:12:43.391Z [INFO]  cache: received request: method=GET path=/v1/secrets/dev/test-secret2
2022-08-02T08:12:43.391Z [INFO]  cache: received request: method=GET path=/v1/secrets/dev/test-secret
2022-08-02T08:12:43.391Z [INFO]  cache.apiproxy: forwarding request: method=GET path=/v1/secrets/dev/test-secret2
2022-08-02T08:12:43.391Z [INFO]  cache.apiproxy: forwarding request: method=GET path=/v1/secrets/dev/test-secret
2022-08-02T08:12:43.396Z [WARN] (view) vault.read(secrets/dev/test-secret2): no secret exists at secrets/dev/test-secret2 (retry attempt 9 after "1m0s")
2022-08-02T08:12:43.396Z [WARN] (view) vault.read(secrets/dev/test-secret): no secret exists at secrets/dev/test-secret (retry attempt 9 after "1m0s")
2022-08-02T08:13:43.397Z [INFO]  cache: received request: method=GET path=/v1/secrets/dev/test-secret
2022-08-02T08:13:43.397Z [INFO]  cache: received request: method=GET path=/v1/secrets/dev/test-secret2
2022-08-02T08:13:43.397Z [INFO]  cache.apiproxy: forwarding request: method=GET path=/v1/secrets/dev/test-secret2
2022-08-02T08:13:43.397Z [INFO]  cache.apiproxy: forwarding request: method=GET path=/v1/secrets/dev/test-secret
2022-08-02T08:13:43.405Z [WARN] (view) vault.read(secrets/dev/test-secret2): no secret exists at secrets/dev/test-secret2 (retry attempt 10 after "1m0s")
2022-08-02T08:13:43.405Z [WARN] (view) vault.read(secrets/dev/test-secret): no secret exists at secrets/dev/test-secret (retry attempt 10 after "1m0s")
jasonodonnell commented 2 years ago

I will take a closer look, but unfortunately the retry logic is complicated due to the interaction of sub systems in agent as per the documentation: https://www.vaultproject.io/docs/agent#retry-stanza

jasonodonnell commented 2 years ago

I did find this note in the code for template: https://github.com/hashicorp/vault/blob/637d4bdc43204cd970a6a549483991183ceff5da/command/agent/template/template.go#L267.

VioletHynes commented 2 years ago

Hi there! I'm pretty sure my PR https://github.com/hashicorp/vault/pull/16970 will fix this issue.

To describe the current behaviour: the issue was that when caching was enabled, configured retries were not respected. If template_config was present (like here), it would ignore the set max_retries value and set retries to the default, otherwise. If template_config is not present, it would ignore the set max_retries value and set retries to 0 (essentially creating an infinite loop of instant retries).

This issue should be resolved soon and fixed in 1.12.0. Thanks for the report!

VioletHynes commented 2 years ago

Hi there! I'm going to close this issue as I just merged https://github.com/hashicorp/vault/pull/16970 which should fix the issue identified here. It should now respect the number of retries in all cases, regardless of if caching is enabled. This should release in 1.12.

Thanks for the bug report!