hashicorp / vault

A tool for secrets management, encryption as a service, and privileged access management
https://www.vaultproject.io/
Other
30.94k stars 4.18k forks source link

Feature: Transit key auto-rotate and key-limit #2705

Closed stampycode closed 3 years ago

stampycode commented 7 years ago

I would like the facility for Vault to auto-rotate a given Transit key at a defined interval. This should be accompanied with the facility to set a fixed number of keys to be kept on a given Transit endpoint.

This would suit JWT signing processes, which should use rotating pre-shared keys for signing short-lifetime tokens, shifting the responsibility for key-security away from the shared secret token itself, and onto the services that provide that secret token.

Suggest:

curl -X POST -H "X-Vault-Token: ${VAULT_TOKEN}" \
    --data '{"max_keep_versions": 3, "auto_rotate_interval": "10m"}'
    https://vault.rocks/v1/transit/keys/mykey/config

💙 Vault. Keep up the great work 😁

jefferai commented 7 years ago

There are no plans to implement this currently. I suggest using cron or some similar tasking mechanism to do this on your chosen interval.

stampycode commented 6 years ago

I implemented this as a cron-task, and after a period of time the script hit this error:

500: /transit/keys/my_key/rotate: 1 error occurred:
* Unexpected response code: 413 (Value exceeds 524288 byte limit)

Currently looking into how to discard old keys.

vishalnayak commented 6 years ago

@stampycode If you are using Consul as your data store, it has a limit on the size of the item that can be persisted which is 512KB. Discarding the old keys seems like the right thing to do (and also probably tuning the rotation interval?).

stampycode commented 6 years ago

@vishalnayak yeah I was looking for how to discard old keys, is that in the API doc somewhere?

vishalnayak commented 6 years ago

@stampycode I see that the Transit keys doesn't allow the deletion of older keys (reasonably so). This might mean that you've hit the rotation limit of the current key.

stampycode commented 6 years ago

@vishalnayak urrm... suggestions? I think this should be escalated... If the min_decryption_version and min_encryption_version aren't allowed to be decremented, then why do we need to keep the old keys at all? They should be deleted for security reasons, no?

Otherwise this means rotate can only be used a finite amount of times before it either becomes a dead endpoint (because it's not deletable) or you need to delete it and start a fresh one, which is hard work.

vishalnayak commented 6 years ago

@stampycode Setting the min_encryption_version and min_decryption_version will allow administrative control on the encryption and decryption operations by clients. It however, does not mean that the keys should be deleted as they might still be required operationally to decrypt old ciphers.

Others might pitch in if they have different opinions/interpretations of the behavior, but IMO, yes rotate can only be used a finite amount of times. Note though, that the limit is imposed not by Vault, but by the underlying storage. This incident also underscores a reason why not to implement the auto-rotation of keys.

stampycode commented 6 years ago

@vishalnayak I'll submit a formal request through the customer portal for this feature/bugfix to be added/fixed. IMHO it's unreasonable to have such an undocumented limitation, especially with no workaround. Auto-rotation is a fundamental part of short-term tokens - Google uses 5 or 10-minute key pairs for implementing Firebase OAuth, just as an example. (you can see them here ) If I tell Vault that I only want to keep the last 3 keys, then I have the expectation that older keys won't be kept. Perhaps there could be a 'delete keys older than' endpoint.

So I'm not really sure whats going on here:

GET ${VAULT_ADDR}/v1/transit/keys/my_key

returns:

{
"allow_plaintext_backup": false,
"backup_info": null,
"deletion_allowed": false,
"derived": false,
"exportable": true,
"keys": {
  "2256": 1517185164,
  "2257": 1517185423,
  "2258": 1517185764,
  ...
  "3709": 1518440636,
  "3710": 1518441026
},
"latest_version": 3710,
"min_decryption_version": 3683,
"min_encryption_version": 3683,
"name": "my_key",
"restore_info": null,
"supports_decryption": true,
"supports_derivation": true,
"supports_encryption": true,
"supports_signing": false,
"type": "aes256-gcm96"
}

Then

POST ${VAULT_ADDR}/v1/transit/keys/my_key/rotate

Returns

[
  "1 error occurred:\n\n* Unexpected response code: 413 (Value exceeds 524288 byte limit)"
]

But, this:

GET ${VAULT_ADDR}/v1/transit/keys/my_key

Now looks like this:

  {
"allow_plaintext_backup": false,
"backup_info": null,
"deletion_allowed": false,
"derived": false,
"exportable": true,
"keys": {
  "2256": 1517185164,
  "2257": 1517185423,
  "2258": 1517185764,
  ...
  "3709": 1518440636,
  "3710": 1518441026,
  "3711": 1518441180        <--- new key
},
"latest_version": 3711,        <--- updated
"min_decryption_version": 3683,
"min_encryption_version": 3683,
"name": "my_key",
"restore_info": null,
"supports_decryption": true,
"supports_derivation": true,
"supports_encryption": true,
"supports_signing": false,
"type": "aes256-gcm96"
}

So even though the rotate is failing with an error, it's persisting the change. The encryption/decryption endpoints still work fine with all keys from version 1 to 3711, but is there now key data missing in the database? I'm a bit concerned some keys might only be accessible in local memory and will be lost if the system is rebooted...

stampycode commented 6 years ago

And I've no idea why the list of keys starts at version 2256. Key 1 still works fine if I change the min decrypt/encrypt version settings.

jefferai commented 6 years ago

@stampycode You aren't telling Vault you want it to only keep the last X keys, you are telling it the minimum version you want users to be able to use for decryption -- hence min_decryption_version. Older keys are kept in an archive, which is what's hitting the limit (which is a Consul limit, not a Vault limit). In most situations for e.g. regulatory reasons users don't actually want old keys to be removed, they just want to prevent old ciphertext that may have been found from being decrypted unless via authorized means. The reason the archive is separate is to keep the working set of keys more performant.

Can you tell us your use-case/workflow? If you want very short term keys, which it seems you do, you may be better off randomly generating context values and using those in conjuction with derived keys, which would allow a single key version to transform into nearly unlimited derived keys, accessed by having the right context.

We could potentially implement an operation to trim key versions at some point in the future.

jefferai commented 6 years ago

Another option would be to create named keys every X minutes with a floored timestamp encoded in the key name. Easy to create, easy to know which one to use, easy to delete. That's more in line with the kind of rotation you're talking about with Firebase.

stampycode commented 6 years ago

@jefferai

You aren't telling Vault you want it to only keep the last X keys

Yeah I know - this is the missing option that I need 😬

Older keys are kept in an archive

So some keys are presumably being lost during a rotate, because of this error? Is this why the keys list is only showing values 2256 onwards? Any chance you could add this small feature while you're fixing this bug? 😁

My use case is basically the same as the Firebase use case - I want a new key every 10 minutes, and only the last 3 keys to be available for decryption. Will never need to decrypt older than 3 keys ago.

I would use derived keys, but it's not possible to prevent decryption of old derived keys. The best alternative I can think of would be to create 3 endpoints, every 10 minutes the oldest one gets deleted and re-created - the downside with this is that clients may occasionally see an "endpoint does not exist" error, and they would also have to hit 3 separate endpoints to get all 3 keys. I could raise the complexity by adding a fourth endpoint and listing the 3 valid endpoints in a shared secret..... but we still increase the complexity and number of calls to Vault.

From the Vault API docs it feels like this should be do-able right now, it's not documented that the number of rotations is finite, and is also limited to quite a low number. It's also not explained that all previous keys are stored indefinitely, or that the min_[de|en]cryption_key setting can actually be reduced to a lower number at any point...

jefferai commented 6 years ago

The best alternative I can think of would be to create 3 endpoints, every 10 minutes the oldest one gets deleted and re-created - the downside with this is that clients may occasionally see an "endpoint does not exist" error

Why would this happen?

I could raise the complexity by adding a fourth endpoint and listing the 3 valid endpoints in a shared secret..... but we still increase the complexity and number of calls to Vault.

I think that's unnecessary if you instead follow my suggestion above to have the key names be time-based (whatever is creating the keys could do it ahead of time to ensure it's always available).

it's not documented that the number of rotations is finite, and is also limited to quite a low number.

The number of rotations isn't finite in Vault. This is purely limited by storage backend choice. It's the same as if you tried to put a 16MB file into a K/V endpoint in Vault. Vault will allow it, the underlying storage won't.

stampycode commented 6 years ago

Why would this happen?

The lack of atomicity means there will be an unavoidable gap between deleting an endpoint and re-creating it again, because there's no single endpoint that does delete and create together.

...have the key names be time-based...

Having time-based named keys is a possible solution, but it raises the difficulty of debugging endpoints that have constantly changing names.

The number of rotations isn't finite in Vault. This is purely limited by storage backend choice.

Yeah, I accept that.

I've just seen the restore key endpoint - would it be possible to use this to recreate the endpoint without old keys? ( I'm assuming I'd have to re-create the endpoint with allow_plaintext_backup:true and deletion_allowed:true )

stampycode commented 6 years ago

(mis-clicked)

stampycode commented 6 years ago

Just tested it; Backing the key up, removing the old keys from the backup, and restoring, works very nicely.

But the endpoint has to be deleted before it can be restored, which is an atomicity problem.

If we could have a 'replace' flag in the 'restore' endpoint, so it replaces the existing endpoint as an atomic operation, that would provide a solid workaround. 👌

@jefferai what are your thoughts on adding this?

jefferai commented 6 years ago

@stampycode That seems more hacky than simply having a trim operation especially since it requires exporting the key in plaintext. Let us discuss internally whether we want to add this. In the meantime I'm addressing the more critical problem in #3959.

amruthar commented 6 years ago

Hi, I understand that now the number of keys in a keyring is limited by the underlying storage, but can the key data be stored in parts in multiple KV pairs so that irrespective of the storage backend used, vault can enable rotating a key any number of times?

jefferai commented 6 years ago

Not currently, although we are doing some thinking on that in a more general sense.

shikhar-a commented 6 years ago

If the vault is storing all the older keys in its storage, then what is even the use of key rotation? Key rotation should ideally mean that only X number of keys should be maintained!

joeshaw commented 3 years ago

Manual key trimming was added in #5388.

vishalnayak commented 3 years ago

Issues that are not reproducible and/or not had any interaction for a long time are stale issues. Sometimes even the valid issues remain stale lacking traction either by the maintainers or the community. In order to provide faster responses and better engagement with the community, we strive to keep the issue tracker clean and the issue count low. In this regard, our current policy is to close stale issues after 30 days. Closed issues will still be indexed and available for future viewers. If users feel that the issue is still relevant but is wrongly closed, we encourage reopening them.

Please refer to our contributing guidelines for details on issue lifecycle.