Closed rizblie closed 1 year ago
Just adding the output of velero restic repo get wordpress-primary-8gzr6 -o yaml
on the secondary cluster. Again it shows the failed restic init
on a repo that already exists. Why is it doing an init
?
apiVersion: velero.io/v1
kind: ResticRepository
metadata:
creationTimestamp: "2022-10-05T18:23:05Z"
generateName: wordpress-primary-
generation: 3
labels:
velero.io/storage-location: primary
velero.io/volume-namespace: wordpress
managedFields:
- apiVersion: velero.io/v1
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:generateName: {}
f:labels:
.: {}
f:velero.io/storage-location: {}
f:velero.io/volume-namespace: {}
f:spec:
.: {}
f:backupStorageLocation: {}
f:maintenanceFrequency: {}
f:resticIdentifier: {}
f:volumeNamespace: {}
f:status:
.: {}
f:message: {}
f:phase: {}
manager: velero-server
operation: Update
time: "2022-10-05T18:23:26Z"
name: wordpress-primary-8gzr6
namespace: velero
resourceVersion: "39841"
uid: ddca26ef-88e9-4055-bb99-b778038b8cb7
spec:
backupStorageLocation: primary
maintenanceFrequency: 168h0m0s
resticIdentifier: s3:s3-us-east-2.amazonaws.com/043124067543-velero-primary/restic/wordpress
volumeNamespace: wordpress
status:
message: |-
error running command=restic init --repo=s3:s3-us-east-2.amazonaws.com/043124067543-velero-primary/restic/wordpress --password-file=/tmp/credentials/velero/velero-restic-credentials-repository-password --cache-dir=/scratch/.cache/restic, stdout=, stderr=Fatal: create repository at s3:s3-us-east-2.amazonaws.com/043124067543-velero-primary/restic/wordpress failed: client.BucketExists: Head "https://043124067543-velero-primary.s3.dualstack.us-west-1.amazonaws.com/": 301 response missing Location header
: exit status 1
phase: NotReady
I'm not sure what's going on, but I notice in that error message that something is trying to access the bucket using a us-west-1 URL rather than us-east-2. It could be that some code in the restic/velero codebase is pulling region from the wrong location.
Thanks @sseago. Yes no matter what I try, Restic ignores the region I have set and tries to connect to S3 bucket using the region that the cluster is running in.
I have tried using all of the the following with no success
So the question boils down to: what is the correct way to tell Restic to use a bucket in a different region to the one it is running in?
Lookoing at the restic docs, I think I need to figure out a way to get velero to add the option -o s3.region="us-east-2"
when calling restic init
. Is there any way to configure velero to add option parameters to restic commands?
There is no easy way to add a new parameter in the Restic command. From the Restic document you post, I think adding an environment variable AWS_DEFAULT_REGION
in Velero server deployment may make it works.
I'm not sure what's going on here. Restic shouldn't be using the region the cluster is running in -- it should be using the BSL region. If restic is using cluster region instead of BSL region, that sounds like a bug. We shouldn't need to pass this in separately to restic. Restic should use the value from the BSL somehow.
@sseago agreed, but it is Velero that is invoking Restic, and the BSL is a Velero object. So Velero somehow needs to communicate that BSL region id through to the Restic CLI - which is currently not happening. Agreed it is a bug.
The Restic docs only seem to offer 2 ways to do this: an environment variable, or a command line option.
There is no easy way to add a new parameter in the Restic command. From the Restic document you post, I think adding an environment variable
AWS_DEFAULT_REGION
in Velero server deployment may make it works.
I tried this by changing the Restic DaemonSet container spec to include:
env:
- name: AWS_DEFAULT_REGION
value: us-east-2
Then I restarted the Restic pods, but unfortunately it did not work. Got the same error as reported previously.
One other thing to try. Looking at restic github issues, at least one user who had this error resolved it by updating the IAM policy to add "s3:GetBucketLocation". Since the failure happens when the initial request (to the default region) attempts to redirect to a different region, it's possible that this permission is missing. I'm not sure this will help (since it may be that in this case we're dealing with the opposite problem -- restic attempting to redirect to the wrong region), but it's worth trying. If you add this to your user bucket policy, does it help?
{
"Version":"2012-10-17",
"Statement":[
{
"Sid":"statement1",
"Effect":"Allow",
"Action":[
"s3:ListAllMyBuckets",
"s3:GetBucketLocation"
],
"Resource":[
"arn:aws:s3:::*"
]
}
]
}
Thanks @sseago , but I already had that permission in the IAM policy.
Velero always set the region name in the AWS URL like this https://bucket-name.s3.region-code.amazonaws.com
, where the region-code is replaced by the value specified in backupStorageLocation.config
.
For Restic, if AWS_DEFAULT_REGION
is not set, it (actually the minio client) gets the region name from the URL; otherwise, it respects the value in AWS_DEFAULT_REGION
all the time.
Therefore, generally, this behavior works in the case mentioned in the current issue. It means the current issue is not a generic problem.
We may need to check where the region name us-west-1
is specified, because if this value is not set in any place, it should not go to the connection URL.
It must not be set in BSL of Velero, because if we check the Restic command Velero runs, the region name is correct: --repo=s3:s3-us-east-2.amazonaws.com
.
Therefore, is there any possibility that AWS_DEFAULT_REGION
is set once more and overwriten with us-west-1
?
@rizblie I tried to reproduce the problem using velero v1.10.0, installed via CLI and credential file but things seemed to work.
I setup 2 EKS clusters on us-east-2
and us-west-1
, using the same command for installation so the velero instances on both cluster point to the same bucket:
./velero install \
--provider aws \
--plugins gcr.io/velero-gcp/velero-plugin-for-aws:v1.6.0 \
--bucket jt-restic-ue2 \
--secret-file xxxxxxxx/aws-credentials \
--backup-location-config region=us-east-2 \
--use-node-agent \
--uploader-type restic \
--wait
I tried to run a backup on the cluster on us-east-2
and restore it on the cluster on us-west-1
, the restore was successful, and the in the spec of the backuprepository
it points to us-east-2
:
k get backuprepositories -n velero -oyaml
.....
spec:
backupStorageLocation: default
maintenanceFrequency: 168h0m0s
repositoryType: restic
resticIdentifier: s3:s3-us-east-2.amazonaws.com/jt-restic-ue2/restic/nginx-example
volumeNamespace: nginx-example
....
Could you try using velero v1.10 and credentials rather than AWS role?
I don't quite understand why restic tries to head the URL us-west-1
when the repo id in the command points to us-east-2
My guess is some setting on the EKS confused restic, which may be a bug in restic.
Closing this issue as not reproducible.
What steps did you take and what happened:
Setup
Steps
Errors from restore describe as follows:
I am confused by the fact that the restore action is executing a
restic init
. The repository already exists, so it just needs an integrity check?See attached debug bundle.
My helm values file for velero on the secondary is as follows: (primary is similar but ReadWrite, and different role with same permissions)
What did you expect to happen:
I expected the Restic volume restore to work in the secondary region, just as it did in the primary region.
The following information will help us better understand what's going on:
If you are using velero v1.7.0+:
Please use
velero debug --backup <backupname> --restore <restorename>
to generate the support bundle, and attach to this issue, more options please refer tovelero debug --help
If you are using earlier versions:
Please provide the output of the following commands (Pasting long output into a GitHub gist or other pastebin is fine.)
kubectl logs deployment/velero -n velero
velero backup describe <backupname>
orkubectl get backup/<backupname> -n velero -o yaml
velero backup logs <backupname>
velero restore describe <restorename>
orkubectl get restore/<restorename> -n velero -o yaml
velero restore logs <restorename>
Anything else you would like to add: I am not sure if this is a bug, or if I am doing something wrong in my config. It works fine on the same cluster in the same region, so what is different about a different cluster/region that I may have required a different config parameter somewhere?
Environment:
velero version
): 1.9.2velero client config get features
):kubectl version
): v1.23.10-eks-15b7512/etc/os-release
): Amazon Linux 2Vote on this issue!
This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.