Closed amareshgreat closed 1 year ago
I believe that this is a known issue with EFS volumes and restic. The problem is that the uid/gid are different in the restored volume. See the below comment on a similar issue: https://github.com/vmware-tanzu/velero/issues/2958
" This is bugging me as well. I may have figured out, what the issue here is.
If you use efs-csi dynamic provisioning of volumes, every pvc gets a unique access-point (with a unique uid/gid) for the efs volume. If you then restore a volume, restic tries to change the ownership back to the old uid/gid, which is not possible.
To solve this, velero-restic-helper would need an option to ignore the old uid/gid while restoring.
If you use a "static" efs pv/pvc the uid/gid won't change. Therefore the Restore works as expected. One big Issue for this Solution is, that you have to manually delete/recreate the PV before starting the restore."
It looks like you may be able to work around this with static provisioning. We have not yet tried to find a long-term fix for the problem in velero itself, but it may be that the suggestion for velero-restic-restore-helper would work.
@Lyndon-Li I'm guessing that kopia would fail in a similar way here, but it's worth investigating whether there's an easier fix for the problem with kopia than with restic.
@sseago - we have a hard requirement for dynamic provisioning of PVC. do we have any workaround to restore the dynamic provisioned EFS based PVC ? do we have an option to ignore the old uid/gid using velero-restic-helper/velero-restic-restore-helper ?
@amareshgreat At this point we don't have a fix, so if the workaround isn't possible in your environment, the only other option would be to wait until a fix can be developed and put into a release. I've seen at least one other person hit this bug recently, so it may be time to prioritize getting a fix in place here.
@sseago Kopia could solve this problem by simply specifying either of below two options:
--skip-owners
: if specified, Kopia restore will skip restoring the uid/gid
--ignore-permission-errors
: if specified, Kopia restore will ignore the error if it is a permission error
Therefore, what Velero needs to do is expose the similar flags for PVR, and then pass the same options to Kopia.
However, in Velero v1.0, we don't plan to add any user experience changes for PVB/PVR, therefore, we will add this to the next release together with some other new flags.
Thanks @Lyndon-Li so this issue will remain as a problem in velero v1.10 but we may fix it in the kopia path in future release.
@Lyndon-Li Are we going away from restic in future velero releases or will restic and kopia exist parallelly and will be as option to users to use?
@navilg I believe the plan is that we will eventually drop restic support, but they will exist in parallel for some time before then. I don't know that we've made a firm decision as to which release will drop restic.
In v1.10, Kopia's IgnorePermissionErrors
flag has been set to true
, this means, when Kopia uploader encounters the same problem, it will ignore it.
It means this problem has been fixed under Kopia path in v1.10.
And it seems that it is not a prioritized task to expose IgnorePermissionErrors
to Velero's CLI, since by default ignoring the permission errors is not a bad thing, we don't see a situation that permission errors must not be ignored.
Let's verify this in v1.11 for Kopia path. For Restic path, since there is no way to fix, we will leave it as is.
@navilg v1.10 kopia path is confirmed to support IgnorePermissionErrors
. Therefore, 1.10 kopia path should work with the current scenario. Please try it.
Closing this issue as it has been fixed in Kopia path and we have no plan or solution to fix it in Restic path.
What steps did you take and what happened: [A clear and concise description of what the bug is, and what commands you ran.)
I have installed velero 1.9.1 with restic integrated to it. I have efs backed volumes on aws cluster. Backup of volumes and manifests are working fine. But when I restore it from backup, restore fails with below error message
What did you expect to happen:
The following information will help us better understand what's going on:
Restore should have been completed successfully.
If you are using velero v1.7.0+:
Please use
velero debug --backup <backupname> --restore <restorename>
to generate the support bundle, and attach to this issue, more options please refer tovelero debug --help
bundle-2022-09-27-10-01-29.tar.gz
If you are using earlier versions:
Please provide the output of the following commands (Pasting long output into a GitHub gist or other pastebin is fine.)
kubectl logs deployment/velero -n velero
velero backup describe <backupname>
orkubectl get backup/<backupname> -n velero -o yaml
velero backup logs <backupname>
velero restore describe <restorename>
orkubectl get restore/<restorename> -n velero -o yaml
velero restore logs <restorename>
Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]
Environment:
velero version
): 1.9.1velero client config get features
): NOT SETkubectl version
): 1.21/etc/os-release
):Vote on this issue!
This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.