Closed pavanfhw closed 3 years ago
@pavanfhw did you managed to make it work?
@WaterKnight1998 no, I got it working with restic though. The problem above was using the csi-plugin.
Is the Ceph cluster shared or are these different storage systems? The snapshot is going to be a Ceph snapshot and is only accessible within the Ceph cluster.
@dsu-igeek yes, they are different storages and ceph clusters. I was simulating the situation where if one cluster blew up, velero backups would be able to restore it on a completely new cluster.
So if you're using the CSI plugin, currently that will take a snapshot using the Ceph snapshotting facility. Ceph snapshots are stored within the cluster. When the cluster is lost/removed all snapshot data will be lost as well. Unfortunately CSI snapshots do not specify whether the snapshot is "durable" (survives loss of primary storage) or not. You should use Restic backup of your data, the snapshots on the Ceph cluster are not really backups.
Understood. To clarify, CSI plugin are not viable to do disaster recovery backups? Is there intention to make them viable?
Same questions as @pavanfhw. Not much documentation on this limitation if CSI snapshots are not meant for DR scenarios if you’ve deployed independent rook-ceph clusters.
I added a warning note in the README - https://github.com/vmware-tanzu/velero-plugin-for-csi/blob/main/README.md
We will be addressing this in a future release but for the moment I recommend you use a Restic backup.
@pavanfhw Could you share your instructions, scripts and anything else to demonstrate how to do a full backup of a Kubernetes cluster which uses rook-ceph?
I am also looking to utalize Velero for my backup and restore solution to a new cluster, in case of something catastrophic happening. Also good for Testing regions and simulating issues on a test region prior to taking it into production...
Are you able to share your steps/scripts/instructions how you achieved it? Maybe a blog writeup somewhere?
I saw the same problem with OCP 4.9 ad ODF 4.9 in ppc64le platform. The problem went away when the 'Delete Policy' was changed from Delete to Retain for VolumeSnapshotClass object (there are two of them). The restore was completed without any errors.
@pavanfhw also interested in the code used to backup from one cluster to another for Rook. Just looking to bump this incase you didn't see the previous message from Psavva
What steps did you take and what happened: Installed velero in 2 cluster which uses rook ceph storage. I can take backups and restore them to the same cluster after deleting all resources in a namespace. But when restoring to another cluster, the PVC cannot be provisioned by rook ceph. Both rook-ceph and velero were deployed the same way in both clusters. I tried both way, backup in cluster 1 and restore in cluster 2 and vice-versa, in both cases the error was the same. PVC provisioning failure:
rook-ceph.rbd.csi.ceph.com_csi-rbdplugin-provisioner-dbc67ffdc-vj2r8_0550728b-5527-41d3-ad5a-c0c6307056b7 failed to provision volume with StorageClass "rook-ceph-block-storage": rpc error: code = Internal desc = key not found: no snap source in omap for "csi.snap.84f0f012-fe89-11ea-bbf7-2ee9609329c4"
What did you expect to happen: Be able to take a rook-ceph volumes backup from one cluster and restore it on another (is this known to be possible?).
The output of the following commands will help us better understand what's going on: (Pasting long output into a GitHub gist or other pastebin is fine.)
kubectl logs deployment/velero -n velero
https://gist.github.com/pavanfhw/91148b5ba1126fcca771cc447de7c957velero backup describe <backupname>
orkubectl get backup/<backupname> -n velero -o yaml
https://gist.github.com/pavanfhw/50b9e74abf6ca944cb36d2e970fa2d12velero backup logs <backupname>
https://gist.github.com/pavanfhw/a3244002f96dd40bcbc683c3cfa87bf1velero restore describe <restorename>
orkubectl get restore/<restorename> -n velero -o yaml
https://gist.github.com/pavanfhw/33a8771de9d64b4ddda075e3f061d6ccvelero restore logs <restorename>
https://gist.github.com/pavanfhw/67a4ff961ded1884bf6a7a1e8289cb70Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.] Using Rook 1.4.3
Velero was intalled with the command
Environment:
velero version
):Client: Version: v1.4.2 Git commit: 56a08a4d695d893f0863f697c2f926e27d70c0c5 Server: Version: v1.4.2
velero client config get features
):features: EnableCSI
kubectl version
):Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.6+k3s1", GitCommit:"6f56fa1d68a5a48b8b6fdefa8eb7ead2015a4b3a", GitTreeState:"clean", BuildDate:"2020-07-16T20:46:15Z", GoVersion:"go1.13.11", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.6+k3s1", GitCommit:"6f56fa1d68a5a48b8b6fdefa8eb7ead2015a4b3a", GitTreeState:"clean", BuildDate:"2020-07-16T20:46:15Z", GoVersion:"go1.13.11", Compiler:"gc", Platform:"linux/amd64"}
/etc/os-release
): K3OSVote on this issue!
This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.