Closed fairclothjm closed 7 months ago
Once https://github.com/hashicorp/vault/pull/25499 is merged and the API is tagged we can pull that in
Just encountered an issue after upgrading the vault provider that my data resource for vault_auth_backend
now errors with * permission denied
.
I had to extend the policy my policy with sudo
now for this to work:
path "sys/auth*" {
capabilities = ["read", "list", "sudo"]
}
This seems like overkill, especially as it's not documented in the vault API docs. Also, the lack of a breaking change documentation isn't nice either ...
@pree Hello, I am sorry you are having trouble! Thanks for pointing this out. We did mention policy changes in the Changelog and we have published a v4.0.0 Upgrade Guide but it looks like we missed this nuance.
I think we may be able get a patch in to remove the sudo requirement, however you will still need to update your policy path to use sys/mounts/auth/*
.
Thanks for the reply @fairclothjm. I would love a # Breaking Change
header in the Changelog to make this more clear.
Removing the sudo
requirement would be the right approach imho.
I have started to play around with this version and so far, it is a night and day difference in our environment. Significantly less CPU (100%->40%) and roughly 3x faster to run though all of our resources.
Thank you so much and kudos @fairclothjm
Thanks and glad to hear that @kkronenb !
Description
This changes the READ operations in
vault_auth_backend
andvault_mount
to use theGET /sys/auth/:path
andGET /sys/mounts/:path
APIs respectively. Previously, these resources were calling LIST which could result in a substantially higher CPU and memory footprint for the provider in cases where a given Vault server has a large number of secret/auth mounts.At this time, there are no helpers for these API paths in the Vault
api
package. See https://github.com/hashicorp/vault/pull/25499 which proposes to add them.TF Config used in the performance tests
``` terraform { required_providers { vault = { source = "hashicorp/vault" version = "~> 3.23.0" } } } provider "vault" { address = "http://localhost:8200" } variable "mount_name_count" { type = number default = 1000 } resource "vault_mount" "kvv2-example" { count = var.mount_name_count path = "kv-mount${count.index}" type = "kv-v2" options = { version = "2" type = "kv-v2" } description = "This is an example KV Version 2 secret engine mount" } resource "vault_auth_backend" "userpass-example" { count = var.mount_name_count type = "userpass" path = "userpass${count.index}" tune { max_lease_ttl = "90000s" listing_visibility = "unauth" } description = "This is an example userpass auth mount" } ```Details on the CPU performance improvements
As expected, the CPU profile shows the biggest improvements in the Provider's READ operations which were spending most of the CPU time listing and decoding mount data.
pprof CPU profile flame graph
Before:
After:
Additionally, from the CPU profile we can see a big difference in the time spent in the call to
mallocgc
:Before:
After:
So let's take a look at the pprof memory profile...
Details on the Memory performance improvements
Interestingly, the pprof memory profile does not shed any light on the performance improvements that we would predict based on the previous CPU profile analysis. However, both before and after point to the AWS SDK init functions as being very memory hungry.
pprof Memory profile
Before:
After:
Relates OR Closes #0000
Checklist