vmware-tanzu / velero

Backup and migrate Kubernetes applications and their persistent volumes
https://velero.io
Apache License 2.0
8.61k stars 1.39k forks source link

GCP Create Snapshot Across Projects #3146

Open brondum opened 4 years ago

brondum commented 4 years ago

Describe the problem/challenge you have Trying to create a snapshot across projects in GCP(gke).

Describe the solution you'd like Would like to be able to specify which project the snapshot should be stored e.g. creating a centralized backup project with both backups and snapshots.

Environment:

skriss commented 4 years ago

@brondum you should be able to do this by specifying the project config key in your VolumeSnapshotLocation, under spec.config. See https://github.com/vmware-tanzu/velero-plugin-for-gcp/blob/master/volumesnapshotlocation.md for some documentation. You'll also need to ensure you have the IAM set up appropriately.

A similar use case is documented here: https://github.com/vmware-tanzu/velero-plugin-for-gcp/blob/master/examples/gcp-projects.md.

brondum commented 4 years ago

@skriss Thank you for the quick response. As far as i can see from the examples that is only if you want to restore between projects. What i am trying to achieve is to make one "backup project" where other projects store their backups and snapshots.

When i set the backup project-id in the project spec in volumesnapshotlocation and creates a backup it looks for the disk in the backup-project rather than in the actual project where the disk is located.

skriss commented 4 years ago

Looking at the code, it looks like the "volume project" is based on the IAM credentials - so your IAM account should be in the same project as where the volumes are. The project key optionally defines the "snapshot project" -- so where the snapshots should be created/restored from. Your IAM account will need the appropriate permissions to create snaphots there.

If you look at https://github.com/vmware-tanzu/velero-plugin-for-gcp/blob/master/velero-plugin-for-gcp/volume_snapshotter.go#L227 you can see that the disk info is gotten from the "volume project", and subsequently the snapshot is created in the "snapshot project".

It looks like this should be valid on both backup and restore, though of course possible there's a bug.

brondum commented 4 years ago

@skriss yeah that is quite clear when looking at that line of code. I will go ahead and test it next week. But, would it be possible to maybe and a config option of "volume project" as well, such you can define if you want to centralized accounts as well ?

brondum commented 4 years ago

@skriss Just to provide some more info on this one. I actually forgot that our service account which handles the GCP snapshots is in the same project as the cluster. Meaning that the "project" spec overwrites the location. Makes sense ?

skriss commented 4 years ago

@brondum if the service account is in the same project as the cluster/disks, and you specify project in your VolumeSnapshotLocation config, then this should work -- the feature was implemented to address exactly that scenario. Is that not working?

brondum commented 4 years ago

@skriss Exactly the scenario here. Unfortunately that does not seem to work.

Snippet from the log, some values have been substituted.



time="2019-12-11T23:00:37Z" level=info msg="Snapshotting persistent volume" backup=velero/sched-20191211230034 group=v1 logSource="pkg/backup/item_backupper.go:472" name=pvc-f1477372-11e2-11ea-bb4b-42010aa4009b namespace= persistentVolume=pvc-f1477372-11e2-11ea-bb4b-42010aa4009b resource=persistentvolumes volumeID=<disk id>
time="2019-12-11T23:00:38Z" level=info msg="1 errors encountered backup up item" backup=velero/sched-20191211230034 group=v1 logSource="pkg/backup/resource_backupper.go:284" name=server-1 namespace= resource=pods
time="2019-12-11T23:00:38Z" level=error msg="Error backing up item" backup=velero/sched-20191211230034 error="error taking snapshot of volume: rpc error: code = Unknown desc = googleapi: Error 404: The resource 'projects/<ID of the backup dest>/zones/europe-west4-c/disks/<disk id>' was not found, notFound" group=v1 logSource="pkg/backup/resource_backupper.go:288" name=server-1 namespace= resource=pods
skriss commented 4 years ago

@brondum OK, I'll have to dig into this some more.

skriss commented 4 years ago

transferring to GCP plugin repo

skriss commented 4 years ago

@brondum sorry for the delay here. I attempted to reproduce this, and as far as I can tell, things worked as expected, with the caveat that I don't actually have a second project set up to create a snapshot in, but I got to an error telling me that the "snapshot project" didn't exist when trying to create a snapshot.

My cloud-credentials secret contains the GCP service account JSON, including the following line:

...
"project_id": "<project where my disk is located>",
...

And my VolumeSnapshotLocation looks like the following:

apiVersion: velero.io/v1
kind: VolumeSnapshotLocation
metadata:
  labels:
    component: velero
  name: default
  namespace: velero
spec:
  config:
    project: snap-proj
  provider: gcp

The only error I end up getting with this setup is:

time="2020-01-17T16:15:24Z" level=error msg="Error backing up item" backup=velero/nginx-2 error="error taking snapshot of volume: rpc error: code = Unknown desc = googleapi: Error 404: Failed to find project snap-proj, notFound" group=v1 logSource="pkg/backup/resource_backupper.go:288" name=nginx-deployment-589cdf7bc4-7qtmc namespace=nginx-example resource=pods

Which indicates to me that it's correctly trying to create a snapshot in the snap-proj project.

Can you take a look and confirm that your config matches this? It sounds from your description like the project_id in your credentials file (in your cloud-credentials secret) contains the "snapshot project", rather than matching the project where the disk is.

brondum commented 4 years ago

@skriss I will try and set time aside for this monday, see if i can recreate the issue post the details here :)

brondum commented 4 years ago

@skriss Finally got to test the stuff, sorry for the delay here.

Okay so i double checked my credentials file. it contains the project_id where both my cluster and my disks reside.

Then i added

apiVersion: velero.io/v1
kind: VolumeSnapshotLocation
metadata:
  name: default
  namespace: velero
spec:
  config:
    project: <THE PROJECT WHERE IT SHOULD END UP>
    snapshotLocation: europe-west4
  provider: velero.io/gcp

Outcome:

time="2020-01-22T13:42:21Z" level=info msg="1 errors encountered backup up item" backup=velero/testsnap group=v1 logSource="pkg/backup/resource_backupper.go:284" name=<SOMENAME> namespace= resource=pods
time="2020-01-22T13:42:21Z" level=error msg="Error backing up item" backup=velero/testsnap error="error taking snapshot of volume: rpc error: code = Unknown desc = googleapi: Error 404: The resource 'projects/<THE PROJECT WHERE IT SHOULD END UP>/zones/europe-west4-a/disks/<SOMEDISK>--pvc-c20bc2ef-1765-11ea-a35b-4201ac1f800a' was not found, notFound" group=v1 logSource="pkg/backup/resource_backupper.go:288" name=<POD_NAME> namespace= resource=pods

So from what i can se, it looks for the disk in the destination project instead of the source project ?

kunickiaj commented 4 years ago

I'm running into the same issue, where I'm using one SA to try to back up multiple projects. During the backup its looking for the correct disk ID, but in the wrong project (the project the SA belongs to), and then fails to create the snapshot. The config.project seems to be irrelevant here.

skriss commented 4 years ago

@brondum @kunickiaj now that I'm looking again, I'm pretty sure that this use case (create the snapshot in a different project than where the disk is) is not currently supported, as you've found.

It doesn't seem like GCP actually supports this, irrespective of velero - can you confirm this?

If true, then you would need a separate "copy" operation to copy the snapshot off to the project you want to store it in.

kunickiaj commented 4 years ago

To clarify, I'm not wanting to create the snapshot in a different project than the disk. Velero always expects that the project the ServiceAccount is in, is the same location as the disks. However a ServiceAccount can be from any project and have access to multiple projects.

my-service-account@project-abc.iam.gserviceaccount.com is provided as cloud credentials for a Velero server running in project project-def. Disks are located in project-def and so snapshots should be in project-def but Velero looks for project-abc/region/disk-xyz rather than project-def/region/disk-xyz.

In this case we have one SA that can access other projects. Velero assumes that the the project id in my-service-account@project-id.iam.gserviceaccount.com is the same as where its running and looks for the disks there, rather than the project Velero is actually running in.

skriss commented 4 years ago

@kunickiaj gotcha - so unfortunately still not supported, but at least this one should be implementable in velero. This is a different issue, or at least a different variation, than what @brondum originally reported.

We'd probably want to add yet another config param to the BSL definition, that if present, overrides the "volume project" (which is by default gotten from where the service account is). I think this needs to be separate from the existing project config param, since that defines where to look for the snapshot during a restore, which isn't necessarily the same as where the disks are.

dirsigler commented 4 years ago

That's exactly what happened me today, after looking further in my current setup. Therefore I have to create ServiceAccounts per Project and install Velero with the corresponding credential files of each ServiceAccount. I would really welcome a feature to allow one "Master" ServiceAccount to do the backups for multiple projects.

skriss commented 4 years ago

is anyone interested in working on a PR for this? I'm happy to provide input, if so.

brondum commented 4 years ago

@skriss Sorry for now getting back to you before now. Finally did some checkups, and i think you are right, in order for my scenario til work a separate copy job needs to be initiated.

And actually it does not seem to be possible within GCP to transfer snapshots, only images can be transferred as far as i can see?

sbernier-corp commented 1 year ago

Hi, I'm facing this issue on the current version. Is it planned to be intergrated ?