Open boluisa opened 3 years ago
Apologies for the delay on additional information.
This is happening in an AWS environment in an isolated region. Retrieving logs from this environment is quite difficult but I'm happy to answer any questions to the best of my abilities, even if getting answers can take a little bit longer. The various service endpoints for this environment are slightly different than the standard Commercial or even GovCloud endpoints and they typically require a specific set of CA Certificates to be added to requests for them to be able to go through.
We initially did what Tunde pasted above -- we generated a velero.yaml
with the above command and then manually kicked it off with a kubectl create -f velero.yaml
. The velero CLI was able to successfully connect to AWS and we were able to see the backup bucket we had chosen. We were able to also kick off backup jobs that would say they completed successfully and we would see that there were files in the S3 Buckets.
We ran into issues actually trying to attempt to run a restore job. It would Partially Fail and report back as stated above:
error executing PVAction for persistent volumes
Caused by POST https://ec2.isolated-region.gov x509 certificate signed by unknown authority
We had mistakenly believed that providing our custom CA certs with the --cacert
flag would include those certificates throughout velero, not just on the BackupStorageLocation object. After some digging I found a potential solution where I could manually insert the certificates I wanted to be included by mounting them into the container and setting the path to the certs in the container to the AWS_CA_BUNDLE
environment variable. So I tried that.
$ kubectl create namespace velero
$ kubectl create configmap cert-bundle --from-file=/home/maintuser/ca-bundle.crt -n velero
# The key of the configmap, unless you specify one, should be the name of the file you generate the configMap from, ca-bundle.crt in our case.
# But let's verify
$ kubectl describe configmap cert-bundle -n velero
I then edited my velero.yaml
to mount the configmap as a volume, then to the container and then finally setting the environment variable. Here's what that ended up looking like.
spec:
template:
spec:
containers:
- args:
- server
- --features=
command:
- /velero
env:
- name: VELERO_SCRATCH_DIR
value: /scratch
- name: AWS_CA_BUNDLE
value: /etc/ssl/certs/ca-bundle.crt
- name: VELERO_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: LD_LIBRARY_PATH
value: /plugins
image: localhost:5000/opensource/velero/velero:v1.5.3
imagePullPolicy: Always
name: velero
ports:
- containerPort: 8085
name: metrics
resources:
limits:
cpu: "1"
memory: 512Mi
requests:
cpu: 500m
memory: 128Mi
volumeMounts:
- mountPath: /plugins
name: plugins
- mountPath: /scratch
name: scratch
- mountPath: /etc/ssl/certs/ca-bundle.crt
subPath: ca-bundle.crt
name: certs
initContainers:
- image: localhost:5000/opensource/velero/velero-plugin-for-aws:v1.2.0
imagePullPolicy: Always
name: velero-plugin-for-aws
resources: {}
volumeMounts:
- mountPath: /target
name: plugins
- mountPath: /etc/ssl/certs/ca-bundle.crt
subPath: ca-bundle.crt
name: certs
restartPolicy: Always
serviceAccountName: velero
volumes:
- emptyDir: {}
name: plugins
- emptyDir: {}
name: scratch
- configMap:
name: cert-bundle
name: certs
After using this new manifest to kick velero off, re-taking backups and trying restores, it finally worked.
Please let me know if you have any additional questions that I might be able to provide answers to for this issue.
Something to consider documenting into aws plugin readme.
What steps did you take and what happened:
We run into the following issue when we perform velero restore operations in AWS Isolated environment. We are also passing a CA cert to communicate with the Cloud service provider.
We are able to perform Velero Backup operations successfully but restore doesn't work.
The restores operation fail with the following approximate error:
We're using the following install command, using the velero CLI, to create a YAML manifest which we edit slightly to add in additional proxy environment variables to velero.
What did you expect to happen: We expected velero restore to complete successfully with workloads restored.
Environment:
Velero Client Version: 1.5.4 Kubernetes 1.18
/etc/os-release
): CentOSVote on this issue!
This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.