hashicorp / vault

A tool for secrets management, encryption as a service, and privileged access management
https://www.vaultproject.io/
Other
31.16k stars 4.21k forks source link

Increased Latency in Auth Calls after Upgrading to Vault 1.13.1 #21441

Closed Rahul-Manglani closed 1 year ago

Rahul-Manglani commented 1 year ago

Describe the bug A clear and concise description of what the bug is. After upgrading our HashiCorp Vault to version 1.13.1, we have encountered latency issues with authentication (auth) calls. We have been using AWS authentication in our setup and the below summary is based on the same, although this would apply to other auth methods as well. Upon thorough investigation, we have identified that this latency is a result of a recently introduced feature: "Enable role based quotas for lease-count quotas" (commit: 614cee3178f3a0d99a92f194ddbb888eb15b539f) in the Vault GitHub repository.

The root cause of the latency stems from a new method called DetermineRoleFromLoginRequest, which is utilized specifically for determining the role. In our case, this method invokes the Security Token Service (STS) every time it is called. Prior to this feature, the STS was only invoked once for AWS authentication. However, with the implementation of DetermineRoleFromLoginRequest, every auth call now triggers three additional invocations of STS for AWS authentication thus increasing the latency by ~3x.

These three invocations occur from the following points within the code:

Rate limiting handler: util.go#L70 HandleLoginRequest: request_handling.go#L1482 CreateLoginToken: request_handling.go#L1788

This increased frequency of STS calls for AWS authentication has led to noticeable latency in our authentication workflow. We believe it would be beneficial to optimize this implementation to reduce the number of STS invocations and alleviate the resulting latency impact.

To Reproduce Steps to reproduce the behavior:

  1. Enable AWS auth - vault auth enable aws
  2. Create a test policy and a test role and associate AWS IAM principal to it
  3. vault login -method=aws role=test-role
  4. Observe latency difference

Expected behavior A clear and concise description of what you expected to happen. There should be no or minimal latency in auth calls

Environment:

Vault server configuration file(s):

# Paste your Vault config here.
# Be sure to scrub any sensitive values

Additional context Stack traces that might help:

  1. DetermineRoleFromLoginRequest call from RateLimitQuotaWrapping: awsauth.submitCallerIdentityRequest (path_login.go:1718) github.com/hashicorp/vault/builtin/credential/aws awsauth.(backend).pathLoginIamGetRoleNameCallerIdAndEntity (path_login.go:320) github.com/hashicorp/vault/builtin/credential/aws awsauth.(backend).pathLoginResolveRoleIam (path_login.go:338) github.com/hashicorp/vault/builtin/credential/aws awsauth.(*backend).pathLoginResolveRole (path_login.go:161) github.com/hashicorp/vault/builtin/credential/aws

    :2 framework.(*Backend).HandleRequest (backend.go:300) github.com/hashicorp/vault/sdk/framework :2 plugin.(*backend).HandleRequest (backend.go:95) github.com/hashicorp/vault/builtin/plugin/v5 vault.(*Core).DetermineRoleFromLoginRequest (core.go:3795) github.com/hashicorp/vault/vault vault.(*Core).DetermineRoleFromLoginRequestFromBytes (core.go:3780) github.com/hashicorp/vault/vault http.rateLimitQuotaWrapping.func1 (util.go:70) github.com/hashicorp/vault/http http.HandlerFunc.ServeHTTP (server.go:2122) net/http http.wrapGenericHandler.func1 (handler.go:442) github.com/hashicorp/vault/http http.HandlerFunc.ServeHTTP (server.go:2122) net/http cleanhttp.PrintablePathCheckHandler.func1 (handlers.go:42) github.com/hashicorp/go-cleanhttp http.HandlerFunc.ServeHTTP (server.go:2122) net/http http.serverHandler.ServeHTTP (server.go:2936) net/http http.(*conn).serve (server.go:1995) net/http http.(*Server).Serve.func3 (server.go:3089) net/http runtime.goexit (asm_arm64.s:1172) runtime - Async Stack Trace http.(*Server).Serve (server.go:3089) net/http
  2. STS call from HandleRequest: awsauth.submitCallerIdentityRequest (path_login.go:1718) github.com/hashicorp/vault/builtin/credential/aws awsauth.(backend).pathLoginIamGetRoleNameCallerIdAndEntity (path_login.go:320) github.com/hashicorp/vault/builtin/credential/aws awsauth.(backend).pathLoginUpdateIam (path_login.go:1334) github.com/hashicorp/vault/builtin/credential/aws awsauth.(*backend).pathLoginUpdate (path_login.go:588) github.com/hashicorp/vault/builtin/credential/aws

    :2 framework.(*Backend).HandleRequest (backend.go:300) github.com/hashicorp/vault/sdk/framework :2 plugin.(*backend).HandleRequest (backend.go:95) github.com/hashicorp/vault/builtin/plugin/v5 vault.(*Router).routeCommon (router.go:782) github.com/hashicorp/vault/vault vault.(*Router).Route (router.go:552) github.com/hashicorp/vault/vault vault.(*Core).doRouting (request_handling.go:851) github.com/hashicorp/vault/vault vault.(*Core).handleLoginRequest (request_handling.go:1413) github.com/hashicorp/vault/vault vault.(*Core).handleCancelableRequest (request_handling.go:693) github.com/hashicorp/vault/vault vault.(*Core).switchedLockHandleRequest (request_handling.go:501) github.com/hashicorp/vault/vault vault.(*Core).HandleRequest (request_handling.go:461) github.com/hashicorp/vault/vault http.request (handler.go:923) github.com/hashicorp/vault/http http.handleLogicalInternal.func1 (logical.go:378) github.com/hashicorp/vault/http http.HandlerFunc.ServeHTTP (server.go:2122) net/http http.handleRequestForwarding.func1 (handler.go:857) github.com/hashicorp/vault/http http.HandlerFunc.ServeHTTP (server.go:2122) net/http http.(*ServeMux).ServeHTTP (server.go:2500) net/http http.wrapHelpHandler.func1 (help.go:28) github.com/hashicorp/vault/http http.HandlerFunc.ServeHTTP (server.go:2122) net/http http.wrapCORSHandler.func1 (cors.go:32) github.com/hashicorp/vault/http http.HandlerFunc.ServeHTTP (server.go:2122) net/http http.rateLimitQuotaWrapping.func1 (util.go:113) github.com/hashicorp/vault/http http.HandlerFunc.ServeHTTP (server.go:2122) net/http http.wrapGenericHandler.func1 (handler.go:442) github.com/hashicorp/vault/http http.HandlerFunc.ServeHTTP (server.go:2122) net/http cleanhttp.PrintablePathCheckHandler.func1 (handlers.go:42) github.com/hashicorp/go-cleanhttp http.HandlerFunc.ServeHTTP (server.go:2122) net/http http.serverHandler.ServeHTTP (server.go:2936) net/http http.(*conn).serve (server.go:1995) net/http http.(*Server).Serve.func3 (server.go:3089) net/http runtime.goexit (asm_arm64.s:1172) runtime - Async Stack Trace http.(*Server).Serve (server.go:3089) net/http
  3. DetermineRoleFromLoginRequest call from HandleLoginRequest: awsauth.submitCallerIdentityRequest (path_login.go:1718) github.com/hashicorp/vault/builtin/credential/aws awsauth.(backend).pathLoginIamGetRoleNameCallerIdAndEntity (path_login.go:320) github.com/hashicorp/vault/builtin/credential/aws awsauth.(backend).pathLoginResolveRoleIam (path_login.go:338) github.com/hashicorp/vault/builtin/credential/aws awsauth.(*backend).pathLoginResolveRole (path_login.go:161) github.com/hashicorp/vault/builtin/credential/aws

    :2 framework.(*Backend).HandleRequest (backend.go:300) github.com/hashicorp/vault/sdk/framework :2 plugin.(*backend).HandleRequest (backend.go:95) github.com/hashicorp/vault/builtin/plugin/v5 vault.(*Core).DetermineRoleFromLoginRequest (core.go:3795) github.com/hashicorp/vault/vault vault.(*Core).handleLoginRequest (request_handling.go:1495) github.com/hashicorp/vault/vault vault.(*Core).handleCancelableRequest (request_handling.go:693) github.com/hashicorp/vault/vault vault.(*Core).switchedLockHandleRequest (request_handling.go:501) github.com/hashicorp/vault/vault vault.(*Core).HandleRequest (request_handling.go:461) github.com/hashicorp/vault/vault http.request (handler.go:923) github.com/hashicorp/vault/http http.handleLogicalInternal.func1 (logical.go:378) github.com/hashicorp/vault/http http.HandlerFunc.ServeHTTP (server.go:2122) net/http http.handleRequestForwarding.func1 (handler.go:857) github.com/hashicorp/vault/http http.HandlerFunc.ServeHTTP (server.go:2122) net/http http.(*ServeMux).ServeHTTP (server.go:2500) net/http http.wrapHelpHandler.func1 (help.go:28) github.com/hashicorp/vault/http http.HandlerFunc.ServeHTTP (server.go:2122) net/http http.wrapCORSHandler.func1 (cors.go:32) github.com/hashicorp/vault/http http.HandlerFunc.ServeHTTP (server.go:2122) net/http http.rateLimitQuotaWrapping.func1 (util.go:113) github.com/hashicorp/vault/http http.HandlerFunc.ServeHTTP (server.go:2122) net/http http.wrapGenericHandler.func1 (handler.go:442) github.com/hashicorp/vault/http http.HandlerFunc.ServeHTTP (server.go:2122) net/http cleanhttp.PrintablePathCheckHandler.func1 (handlers.go:42) github.com/hashicorp/go-cleanhttp http.HandlerFunc.ServeHTTP (server.go:2122) net/http http.serverHandler.ServeHTTP (server.go:2936) net/http http.(*conn).serve (server.go:1995) net/http http.(*Server).Serve.func3 (server.go:3089) net/http runtime.goexit (asm_arm64.s:1172) runtime - Async Stack Trace http.(*Server).Serve (server.go:3089) net/http
  4. DetermineRoleFromLoginRequest call from LoginCreateToken: awsauth.submitCallerIdentityRequest (path_login.go:1718) github.com/hashicorp/vault/builtin/credential/aws awsauth.(backend).pathLoginIamGetRoleNameCallerIdAndEntity (path_login.go:320) github.com/hashicorp/vault/builtin/credential/aws awsauth.(backend).pathLoginResolveRoleIam (path_login.go:338) github.com/hashicorp/vault/builtin/credential/aws awsauth.(*backend).pathLoginResolveRole (path_login.go:161) github.com/hashicorp/vault/builtin/credential/aws

    :2 framework.(*Backend).HandleRequest (backend.go:300) github.com/hashicorp/vault/sdk/framework :2 plugin.(*backend).HandleRequest (backend.go:95) github.com/hashicorp/vault/builtin/plugin/v5 vault.(*Core).DetermineRoleFromLoginRequest (core.go:3795) github.com/hashicorp/vault/vault vault.(*Core).LoginCreateToken (request_handling.go:1802) github.com/hashicorp/vault/vault vault.(*Core).handleLoginRequest (request_handling.go:1687) github.com/hashicorp/vault/vault vault.(*Core).handleCancelableRequest (request_handling.go:693) github.com/hashicorp/vault/vault vault.(*Core).switchedLockHandleRequest (request_handling.go:501) github.com/hashicorp/vault/vault vault.(*Core).HandleRequest (request_handling.go:461) github.com/hashicorp/vault/vault http.request (handler.go:923) github.com/hashicorp/vault/http http.handleLogicalInternal.func1 (logical.go:378) github.com/hashicorp/vault/http http.HandlerFunc.ServeHTTP (server.go:2122) net/http http.handleRequestForwarding.func1 (handler.go:857) github.com/hashicorp/vault/http http.HandlerFunc.ServeHTTP (server.go:2122) net/http http.(*ServeMux).ServeHTTP (server.go:2500) net/http http.wrapHelpHandler.func1 (help.go:28) github.com/hashicorp/vault/http http.HandlerFunc.ServeHTTP (server.go:2122) net/http http.wrapCORSHandler.func1 (cors.go:32) github.com/hashicorp/vault/http http.HandlerFunc.ServeHTTP (server.go:2122) net/http http.rateLimitQuotaWrapping.func1 (util.go:113) github.com/hashicorp/vault/http http.HandlerFunc.ServeHTTP (server.go:2122) net/http http.wrapGenericHandler.func1 (handler.go:442) github.com/hashicorp/vault/http http.HandlerFunc.ServeHTTP (server.go:2122) net/http cleanhttp.PrintablePathCheckHandler.func1 (handlers.go:42) github.com/hashicorp/go-cleanhttp http.HandlerFunc.ServeHTTP (server.go:2122) net/http http.serverHandler.ServeHTTP (server.go:2936) net/http http.(*conn).serve (server.go:1995) net/http http.(*Server).Serve.func3 (server.go:3089) net/http runtime.goexit (asm_arm64.s:1172) runtime - Async Stack Trace http.(*Server).Serve (server.go:3089) net/http
heatherezell commented 1 year ago

Are you able to share before/after latency times? Thanks! :)

Rahul-Manglani commented 1 year ago

Latency numbers with vault main branch:

rmanglani@rmangla-ltmdamt vault % git branch

Key Value


token {{redacted}} token_accessor {{redacted}} token_duration 768h token_renewable true token_policies {{redacted}} identity_policies [] policies {{redacted}} token_meta_role_id {{redacted}} token_meta_account_id {{redacted}} token_meta_auth_type iam ./bin/vault login -method=aws role=test-role 0.05s user 0.02s system 2% cpu 2.225 total

rmanglani@rmangla-ltmdamt vault % time ./bin/vault login -method=aws role=test-role Success! You are now authenticated. The token information displayed below is already stored in the token helper. You do NOT need to run "vault login" again. Future Vault requests will automatically use this token.

Key Value


token {{redacted}} token_accessor {{redacted}} token_duration 768h token_renewable true token_policies {{redacted}} identity_policies [] policies {{redacted}} token_meta_auth_type iam token_meta_role_id {{redacted}} token_meta_account_id {{redacted}} ./bin/vault login -method=aws role=test-role 0.06s user 0.03s system 3% cpu 2.288 total

Rahul-Manglani commented 1 year ago

latency numbers with vault 1.11.3 branch:

rmanglani@rmangla-ltmdamt vault % git branch
main

rmanglani@rmangla-ltmdamt vault % time ./bin/vault login -method=aws role=test-role Success! You are now authenticated. The token information displayed below is already stored in the token helper. You do NOT need to run "vault login" again. Future Vault requests will automatically use this token.

Key Value


token {{redacted}} token_accessor {{redacted}} token_duration 768h token_renewable true token_policies {{redacted}} identity_policies [] policies {{redacted}} token_meta_auth_type iam token_meta_role_id {{redacted}} token_meta_account_id {{redacted}} ./bin/vault login -method=aws role=test-role 0.05s user 0.03s system 8% cpu 0.906 total

rmanglani@rmangla-ltmdamt vault % time ./bin/vault login -method=aws role=test-role Success! You are now authenticated. The token information displayed below is already stored in the token helper. You do NOT need to run "vault login" again. Future Vault requests will automatically use this token.

Key Value


token {{redacted}} token_accessor {{redacted}} token_duration 768h token_renewable true token_policies {{redacted}} identity_policies [] policies {{redacted}} token_meta_account_id {{redacted}} token_meta_auth_type iam token_meta_role_id {{redacted}} ./bin/vault login -method=aws role=test-role 0.06s user 0.03s system 9% cpu 0.894 total

heatherezell commented 1 year ago

Thank you! I appreciate the quick response. :)

kschoche commented 1 year ago

Hi! Just following up here :) This issue was resolved in #22583 and back-ported+released in v1.13.7. Cheers