hashicorp / vault-plugin-secrets-gcp

Mozilla Public License 2.0
52 stars 24 forks source link

Deleting service account keys not retried #75

Open david-behnke opened 4 years ago

david-behnke commented 4 years ago

Describe the bug We discovered that short lived service account keys are slowly accumulating in some of our environments. After investigation we found out that Vault / the plugin does not retry deleting the corresponding keys when the initial deletion request failed even though the leases are no longer used / available in Vault.

Updating Vault from 1.3.0 to 1.3.2 did not resolve the issue (delete the expired service account keys).

To Reproduce We have short lived TTLs for these keys (30 minutes) and request a new key every 5 minutes. Having such a high frequency (or by means of forcing GCP-API errors) might help in reproducing this.

Expected behavior Service accounts which are managed by Vault, should be checked regularly for expired/unused keys. From what I understand this should already be done by the Rollback functionality.

Environment Vault Server Version: v1.3.2 Vault Client Version: v1.3.2

sethvargo commented 4 years ago

@emilymye might be able to speak to it more, but I don't believe Vault gives us a good mechanism to retry these deletion requests.

david-behnke commented 4 years ago

I was mistaken regarding the already existing implementation. WALRollbacks are implemented but the missing functionality should be part of the PeriodicFunc.

The way I see it there are 2 options (option 2 sounds better to me):

  1. implement a periodic function that iterates through the keys of the service account that is tied to the role set, compare the creation date to the MaxTTL of the config and delete the key if it should have been expired necessary.
  2. capture failed deletion attempts and write the necessary data to the storage and retry these deletion attempts within the periodic function.

Alternatively we could probably solve the issue for us by regularly rotating the service account via the Vault API.

What's your take on this?

frodera commented 4 years ago

We're having the same issue described by @david-behnke.

One of our monitoring processes requests a new key every 2 minutes (with a low TTL) via Vault roleset and after a few weeks running we have hit an Error 429: Maximum number of keys on account reached.. After investigation we noticed that old service account keys had piled up and reached the maximum of 10.

While the process of cleaning up "orphan" keys can be easily automated by us the ideal and more robust solution would be Vault handling the deletion retries itself.

Vault Server Version: v1.3.1