vmware-tanzu / velero

Backup and migrate Kubernetes applications and their persistent volumes
https://velero.io
Apache License 2.0
8.41k stars 1.37k forks source link

Encrypt backups at rest #434

Open marpaia opened 6 years ago

marpaia commented 6 years ago

Right now, when backups are created via ark backup create, sensitive objects are stored unencrypted at rest. In Google Cloud, there is excellent Go support for encrypting Google Cloud Storage objects with Google Cloud KMS in a way that works rather transparently to the caller.

I would love a way to configure the KMS key to use when storing backups. The best practice is to have a separate project for the key and grant IAM permissions from there, which I could easily do for the heptio-ark SA that already is required when setting up access to the bucket.

At @kolide, we have an internal tool that works like that which I'd like to potentially replace with Ark, so if there is a reasonable integration point within Ark for this kind of thing, perhaps some of our existing code can be upstreamed for this use-case.

ncdc commented 6 years ago

@marpaia thanks for this request. Could you please explain in a bit more detail what the flow looks like between Ark and the separate project that contains the key? Would you specify the project and the key in the backupStorageProvider config, and would Ark retrieve the key from the specified project?

marpaia commented 6 years ago

Encrypting backups at rest is for sure not something that is unique to GCP, but in GCP you need three bits of information to encrypt/decrypt a blob:

I think Ark is already 1:1 tied with a GCP Project, but it's worth noting that GCP has a Separation of Duties document which outlines the best practice of storing your KMS key-ring in an isolated project.

The code that I have now is an implementation of the following interface for storing a *corev1.Secret in GCS encrypted at rest via KMS:

package secret

// Store is the interface which defines the controllers interactions with an
// arbitrary exo-cluster secret storage mechanism.
type Store interface {
    Get(ctx context.Context, namespace string, name string) (*corev1.Secret, error)
    List(ctx context.Context, namespace string) ([]*corev1.Secret, error)
    Put(ctx context.Context, s *corev1.Secret) error
    Delete(ctx context.Context, namespace, name string) error
}

I don't think my implementation would be super useful to you because you can current only use the KMS API to encrypt/decrypt chunks of data which are 64KiB. From the docs:

Cloud KMS can handle secrets up to 64 KiB in size. If you need to encrypt larger secrets, it is recommended that you use a key hierarchy, with a locally-generated data encryption key (DEK) to encrypt the secret, and a key encryption key (KEK) in Cloud KMS to encrypt the DEK. To learn more about DEKs, see Envelope Encryption.

This envelope encryption song and dance is kind of annoying. KMS has one job IMO: decrypt/encrypt my damn stuff. Anyway, we avoid it entirely by just implementing an interface like the above. This allows us to just deals with individual secrets, so they're all smaller than 64KiB in our environment.

Let me know if some of our KMS snippets would be helpful and I can share them privately @ncdc.

marpaia commented 6 years ago

It's also worth noting that the link I posted above also allows you to encrypt an object with a single 32-byte AES-256 key. Rather than using envelope encryption to encrypt the tar, it would probably be easiest to use KMS to encrypt the key (ala-DEK) and store that in GCS as well. Decrypt just the key via KMS and use this API to encrypt the entire tar with the one key. The key rotation and access control story is not as good with this solution, but it's a much simpler solution in general.

rosskukulinski commented 6 years ago

It might be get to get @mattmoyer's thoughts on this as I don't fully understand the chain of trust using KMS and the impact on Ark & security. How does something like vault or sealed-secrets play into this?

We should also look at the other major cloud providers (and any bare-metal equivalents) to make sure this feature will work across a variety of platforms.

ncdc commented 6 years ago

Agreed, @mattmoyer let us know if you have some time to discuss

erasmus74 commented 5 years ago

Hows this? Allow using kubeseal for secrets, just add optional params, '--kubeseal' which now requires '--secret' '--controller' etc. The tool then encrypts the backup using a generated key as a secret, secret is sealed, exists as 'secret-name' then for restore, it would only need to use that same secret for decryption.

Might need fineseing but I currently have to install components on the pod and do Etcdctl snapshot so it'd be awesome to have it running as a k8s batch job.

I'll help if I can, not a go programmer YET.

ncdc commented 5 years ago

@erasmus74 thanks for your idea! @mattmoyer WDYT?

rosskukulinski commented 5 years ago

There was also a suggestion from #ark-dr to use https://github.com/mozilla/sops/.

ncdc commented 5 years ago

@erasmus74 are you describing using kubeseal to seal the entire backup tarball, or just the secrets contained in the tarball?

kzap commented 4 years ago

Do we want to encrypt everything or just the secrets? if just secrets then wouldnt running encryption at rest suffice?

prydonius commented 4 years ago

@kzap Velero uses the Kubernetes API server to backup resources, instead of backing up from etcd directly. This means that even if encryption at rest is enabled, Velero will backup the plaintext Secret because it is decrypted by the Kubernetes API server.

From the Velero perspective, I imagine it would be easiest to encrypt the full backup.

In the meantime, you could use something like sealed-secrets and only backup the encrypted SealedSecret resources instead of Secrets.

kzap commented 4 years ago

Thank you for clarifying, is there a way to encrypt the full backup before storing in in Storage? Can restic take care of this part for us?

prydonius commented 4 years ago

Unfortunately not really, the backups with restic are encrypted with a static key (see https://github.com/vmware-tanzu/velero/issues/1053). Even if that work is done though, restic is only used to store volume snapshots, so the resources that were backed up also needs to be encrypted and stored.

turkenh commented 4 years ago

AFAIU, it is already possible to configure (some of ?) the providers for encryption at rest using config field in BackupStorageLocation:

GCP: https://github.com/vmware-tanzu/velero-plugin-for-gcp/blob/master/backupstoragelocation.md AWS: https://github.com/vmware-tanzu/velero-plugin-for-aws/blob/master/backupstoragelocation.md

Also a related PR: https://github.com/vmware-tanzu/velero/pull/1879

I think this is not related to snapshotting volumes but already makes it possible to encrypt kubernetes resources (including secrets) at rest.

What is the difference with the issue here? Can't we say "velero already supports encrypting backups at rest?"

skriss commented 4 years ago

@turkenh you're correct, Velero already supports server-side encryption at rest in the AWS, Azure and GCP plugins.

As part of this issue, we also discussed client-side encryption, i.e. Velero encrypts the backup data before sending it up to object storage, rather than letting the object storage system itself encrypt the data.

We've decided not to go down that path for now, so I do think we could probably close this issue out, and reopen/open new ones if we decide to look at client-side encryption down the road.

We're planning to do a big backlog review in the near future, so when we look at this issue as a team we can decide if we're ready to close it.

fgimenezm commented 3 years ago

About client-side encryption, why not implementing the aws-sdk-go client-side-encryption using the "Option 2: Using a master key stored within your application" explained here: https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingClientSideEncryption.html#client-side-encryption-client-side-master-key-intro ?

Edit: updated link

briantopping commented 3 years ago

I found this issue while searching to find if Vault could be used as a KMS. https://github.com/libopenstorage/secrets could be used as an abstraction layer, the KMS it supports are shown on the top directory of that project. It's an abstraction on these projects, each of which provides different features and benefits. Most importantly, Valero would be more adaptable for enterprise deployments that already depend on one of these KMS.

carlisia commented 3 years ago

@fgimenezm and @briantopping thanks for these inputs, definitely helpful when we get around to designing for this. Keep suggestions and ideas coming, thank you.

eleanor-millman commented 2 years ago

Additional explanation from a Velero dev that helped me understand this better:

There are two places we might want to think about encryption for encryption at rest: the block storage where the volume snapshotters store snapshots and the object storage where restic backups are sent and where the metadata tarball is stored. Block storage largely already has encryption. For example, if you are using EBS, you can enable encryption there. Velero doesn’t need to know about encryption or how to unencrypt since EBS handles that all under the covers. Velero just triggers the EBS APIs with a snapshot ID to read the data during a restore. However, object storage is much trickier. While encryption may already be available through, say, S3, Velero would have to actually unencrypt the data before it can do a restore. This means that Velero would have to handle user keys and what would happen if a user loses their key. Because this could result in users being locked out of their backups and other security issues, we want to tread carefully here.(Note from PM: Encryption at Rest is on the Velero roadmap, but we first want to investigate possibilities so we land on the safest solution for Velero users.)