vmware-tanzu / velero

Backup and migrate Kubernetes applications and their persistent volumes
https://velero.io
Apache License 2.0
8.69k stars 1.4k forks source link

cacert integration #2675

Closed reddiablo85 closed 4 years ago

reddiablo85 commented 4 years ago

We have installed the new version of Velero (V1.4.0) and tested our backups & restores to our internal s3 provider over http. No issues to report. We decided to reinstall to start using https. The first issue we encountered was that the helm chart doesn´t seem to include a configurable parameter to point the velero workload to an internal cacert. To get around this we installed the cacert manually with only access to the velero namespace. It still failed with the following in the velero logs:

time="2020-06-30T07:28:39Z" level=info msg="Checking that all backup storage locations are valid" logSource="pkg/cmd/server/server.go:437" An error occurred: some backup storage locations are invalid: backup store for location "default" is invalid: rpc error: code = Unknown desc = RequestError: send request failed caused by: Get https://s3.er.abc.com:10443/velero?delimiter=%2F&list-type=2&prefix=abc-lab09%2F: x509: certificate signed by unknown authority

We removed the installation and installed manually using the CLI and placed the certs locally on the machine we were launching from, see install commands below:

velero install \ --plugins dtr01.er.abc.com:4002/velero/velero-plugin-for-aws:v1.1.0 \ --provider aws \ --use-restic \ --image dtr01.er.abc.com:4002/velero/velero:v1.4.0 \ --use-volume-snapshots=false \ --backup-location-config region="us-east-1",s3ForcePathStyle="true",s3Url="https://s3.er.abc.com:10443" \ --cacert /root/rancherlab/certs/abc_root_bundle.crt \ --secret-file /root/rancherlab/Velero/crds \ --bucket velero \ --prefix abc-lab09

This failed with the same logs as above. We have tried a variety of differnt cert types (pem, crt, with/without intermediate etc) but they alll return the same issue. The pod for velero stays in a crashbootloop and doesn´t deploy.

What did you expect to happen: Backups and restores functioning over https

The output of the following commands will help us better understand what's going on: (Pasting long output into a GitHub gist or other pastebin is fine.)

Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]

Environment:

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

reddiablo85 commented 4 years ago

so after further testing we were able to mount a pvc into the velero deployment and store the certificate in there following that we were able to add the variable AWS_CA_BUNDLE to our deployment (as per this comment on another issue https://github.com/vmware-tanzu/velero/issues/1027#issuecomment-571917218)

now we are working on building all of this into the helm chart to ease the deployment

reddiablo85 commented 4 years ago

Have to reopen this issue. As mentioned previously we have solved the Velero side of things by doing the following:

This allows us visibility from the velero client to our s3 bucket over https. However the cert is not passed to the restic pods it seems and "velero restic repos get" consistently shows the pods in a NotReady status. Digging deeper shows an x509 cert invalid error.

Steps we´ve taken to try and rectify this:

How can we get restic to recognise the certificate?

See below result of velero restice repo get {REPO} -o yaml

[root@labcpadm01t 1.1.0]# velero restic repo get lab-test01-default-dml68 -o yaml apiVersion: velero.io/v1 kind: ResticRepository metadata: creationTimestamp: "2020-07-01T13:53:02Z" generateName: lab-test01-default- generation: 3 labels: velero.io/storage-location: default velero.io/volume-namespace: lab-test01 name: lab-test01-default-dml68 namespace: velero resourceVersion: "20183" selfLink: /apis/velero.io/v1/namespaces/velero/resticrepositories/lab-test01-default-dml68 uid: dc427d7e-0fab-457e-9334-c1a5f228ca2b spec: backupStorageLocation: default maintenanceFrequency: 168h0m0s resticIdentifier: s3:https://s3.er.abc.com:10443/velero/gdc-lab299/restic/lab-test01 volumeNamespace: lab-test01 status: message: |- error running command=restic init --repo=s3:https://s3.er.abc.com:10443/velero/gdc-lab299/restic/lab-test01 --password-file=/tmp/velero-restic-credentials-lab-test01011975119 --cache-dir=/scratch/.cache/restic, stdout=, stderr=Fatal: create repository at s3:https://s3.er.abc.com:10443/velero/gdc-lab299/restic/lab-test01 failed: client.BucketExists: Get https://s3.er.abc.com:10443/velero/?location=: x509: certificate signed by unknown authority

: exit status 1

phase: NotReady

airmonitor commented 4 years ago

Hi.

Please try recent version 1.4.2

velero install \ --image velero/velero-arm64:v1.4.2 \ --provider aws \ --plugins velero/velero-plugin-for-aws:master \ --bucket $BUCKET \ --backup-location-config region=$REGION \ --snapshot-location-config region=$REGION \ --secret-file ~/.aws/credentials-velero \ --cacert s3-eu-central-1-amazonaws-com.pem \

reddiablo85 commented 4 years ago

Hi @airmonitor That seems to have solved the issue. Thanks very much. I have discovered another problem when trying to build in the "--default-volumes-to-restic" flag into the velero install command but I´ll open a separate issue for it if necessary.