Open loktionovam opened 4 years ago
The New
phase means that either the upload controllers for upload CRD are not brought up properly or the upload controllers do not work as expected. Based on the information provided above, I cannot confirm the root cause.
Once velero-plugin-for-vsphere is installed, a Daemonset of pods, "data-manager-XXX", will be brought up in the same namespace as velero pod.
Hi @loktionovam, would you please verify it and share logs of "data-manager-XXX" pods?
The
New
phase means that either the upload controllers for upload CRD are not brought up properly or the upload controllers do not work as expected. Based on the information provided above, I cannot confirm the root cause.Once velero-plugin-for-vsphere is installed, a Daemonset of pods, "data-manager-XXX", will be brought up in the same namespace as velero pod.
Hi @loktionovam, would you please verify it and share logs of "data-manager-XXX" pods?
Hi @lintongj I can't find any pods data-manager-XXX
. When I enable the plugin:
velero plugin add vsphereveleroplugin/velero-plugin-for-vsphere:1.0.0
the main velero pod restarts, vsphereveleroplugin/velero-plugin-for-vsphere:1.0.0
appear in its initContainers and that is all.
kubectl get pods -n velero ─╯
NAME READY STATUS RESTARTS AGE
velero-596dd56ff9-ckwlw 1/1 Running 0 114m
initContainers:
- image: velero/velero-plugin-for-aws:v1.0.1
imagePullPolicy: IfNotPresent
name: velero-plugin-for-aws
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /target
name: plugins
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: velero-server-token-8g297
readOnly: true
- image: vsphereveleroplugin/velero-plugin-for-vsphere:1.0.0
imagePullPolicy: IfNotPresent
name: velero-plugin-for-vsphere
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /target
name: plugins
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: velero-server-token-8g297
readOnly: true
Why would you run export VELERO_NAMESPACE=backup-system
while velero pod in your case is running in the namespace, velero?
What is the namespace, backup-system, used for? Would you please share kubectl -n backup-system get all
?
Also, would you please share your velero deployment? kubectl -n velero get deploy/velero -o yaml
Sorry, there was a configuration drift when I tried to install velero in its default namespace velero
(with no luck). Now, I revert configuration as described in issue:
kubectl get pods -n backup-system
NAME READY STATUS RESTARTS AGE
minio-6c685bd979-4c2bv 1/1 Running 0 29h
velero-794555cbb-nzghc 1/1 Running 0 5m44s
kubectl -n backup-system get deploy/velero -o yaml
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "2"
meta.helm.sh/release-name: velero
meta.helm.sh/release-namespace: backup-system
creationTimestamp: "2020-06-11T18:02:36Z"
generation: 2
labels:
app.kubernetes.io/instance: velero
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: velero
helm.sh/chart: velero-2.12.0
name: velero
namespace: backup-system
resourceVersion: "6716531"
selfLink: /apis/apps/v1/namespaces/backup-system/deployments/velero
uid: c9f76c19-3033-4ecf-bb01-81c5237bf32e
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app.kubernetes.io/instance: velero
app.kubernetes.io/name: velero
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
annotations:
prometheus.io/path: /metrics
prometheus.io/port: "8085"
prometheus.io/scrape: "true"
creationTimestamp: null
labels:
app.kubernetes.io/instance: velero
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: velero
helm.sh/chart: velero-2.12.0
name: velero
spec:
containers:
- args:
- server
command:
- /velero
env:
- name: VELERO_SCRATCH_DIR
value: /scratch
- name: VELERO_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
- name: LD_LIBRARY_PATH
value: /plugins
- name: AWS_SHARED_CREDENTIALS_FILE
value: /credentials/cloud
image: velero/velero:v1.4.0
imagePullPolicy: IfNotPresent
name: velero
ports:
- containerPort: 8085
name: monitoring
protocol: TCP
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /plugins
name: plugins
- mountPath: /credentials
name: cloud-credentials
- mountPath: /scratch
name: scratch
dnsPolicy: ClusterFirst
initContainers:
- image: velero/velero-plugin-for-aws:v1.0.1
imagePullPolicy: IfNotPresent
name: velero-plugin-for-aws
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /target
name: plugins
- image: vsphereveleroplugin/velero-plugin-for-vsphere:1.0.0
imagePullPolicy: IfNotPresent
name: velero-plugin-for-vsphere
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /target
name: plugins
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: velero-server
serviceAccountName: velero-server
terminationGracePeriodSeconds: 30
volumes:
- name: cloud-credentials
secret:
defaultMode: 420
secretName: velero
- emptyDir: {}
name: plugins
- emptyDir: {}
name: scratch
status:
availableReplicas: 1
conditions:
- lastTransitionTime: "2020-06-11T18:02:39Z"
lastUpdateTime: "2020-06-11T18:02:39Z"
message: Deployment has minimum availability.
reason: MinimumReplicasAvailable
status: "True"
type: Available
- lastTransitionTime: "2020-06-11T18:02:36Z"
lastUpdateTime: "2020-06-11T18:04:43Z"
message: ReplicaSet "velero-794555cbb" has successfully progressed.
reason: NewReplicaSetAvailable
status: "True"
type: Progressing
observedGeneration: 2
readyReplicas: 1
replicas: 1
updatedReplicas: 1
I guess I find the root cause:
kubectl get events
LAST SEEN TYPE REASON OBJECT MESSAGE
12m Warning FailedCreate daemonset/datamgr-for-vsphere-plugin Error creating: pods "datamgr-for-vsphere-plugin-" is forbidden: error looking up service account backup-system/velero: serviceaccount "velero" not found
I deployed velero via official helm chart:
kubectl get sa
NAME SECRETS AGE
default 1 29h
minio 1 29h
minio-update-prometheus-secret 1 29h
velero-server 1 18m
I wonder whether you explictly set the ServiceAccountName to "velero-server"
while installing velero via helm chart. By default, it is "velero"
, according to https://github.com/vmware-tanzu/velero/blob/a5346c1a87c91788aeb3e2e03be7f42ebc23d95c/pkg/install/deployment.go#L150. Would you please share what did you do to install velero via helm chart? So that we can reproduce the issue and incorporate it into our test cases.
Meanwhile, it actually exposes a bug in velero-plugin-for-vsphere, where we hardcoded the ServiceAccountName using the default one. In the default case, it works as expected. However, if users explicitly change to use a customized ServiceAccount/ServiceAccountName, instead of the default one, in velero pod, pods in daemonset/datamgr-for-vsphere-plugin cannot be brought up as expected.
It is an issue we need to resolve post-1.0.1 release (release 1.0.1 is coming soon). Before the fix is merged and released, users are highly recommended to use the default ServiceAccount/ServiceAccountName in velero pod or explicitly create a ServiceAccount, "velero", if there is none.
I didn't set the service account. This is my helm chart configuration:
credentials:
useSecret: true
secretContents:
cloud: |
[default]
aws_access_key_id = access_key_here
aws_secret_access_key = secret_access_key_here
configuration:
provider: aws
backupStorageLocation:
name: default
bucket: velero
provider: aws
config:
region: minio
s3ForcePathStyle: true
s3Url: http://minio.backup-system.svc.devel.pro:9000
snapshotsEnabled: true
deployRestic: false
initContainers:
- name: velero-plugin-for-aws
image: velero/velero-plugin-for-aws:v1.0.1
volumeMounts:
- mountPath: /target
name: plugins
Service account name templated via velero.serverServiceAccount
helper function here:
{{- if .Values.serviceAccount.server.create }}
apiVersion: v1
kind: ServiceAccount
metadata:
name: {{ include "velero.serverServiceAccount" . }}
velero.serverServiceAccount
helper function code:
{{- define "velero.serverServiceAccount" -}}
{{- if .Values.serviceAccount.server.create -}}
{{ default (printf "%s-%s" (include "velero.fullname" .) "server") .Values.serviceAccount.server.name }}
{{- else -}}
{{ default "default" .Values.serviceAccount.server.name }}
{{- end -}}
{{- end -}}
Helm chart default values related to the serviceAccount
:
serviceAccount:
server:
create: true
name:
annotations:
So for default serviceAccount
value from the helm chart service, account name rendered via {{ default (printf "%s-%s" (include "velero.fullname" .) "server")
and become equal to velero-server
My suggestion is to add the section "how it works" (or something like that https://blogs.vmware.com/opensource/2020/04/17/velero-plug-in-for-vsphere/) to the README where daemonset/datamgr-for-vsphere-plugin
will be described.
@loktionovam Thanks for the exploration in the helm chart configuration. On one hand, we will add the suggestion for to our documentation (FAQ.md) for released versions. On the other hand, we will fix this issue in our next release.
I try using this plugin and data upload progress hangs up in New phase without any errors.
My current installation:
How to reproduce:
snapshot-location
enabled as described in README:velero.log