Open jadsy2107 opened 2 years ago
I've just completely blasted the cluster, bringing up a fresh cluster now with these values;
# Default values for cstor-operators.
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.
release:
version: "3.4.0"
# If false, openebs NDM sub-chart will not be installed
openebsNDM:
enabled: true
rbac:
# rbac.create: `true` if rbac resources should be created
create: true
# rbac.pspEnabled: `true` if PodSecurityPolicy resources should be created
pspEnabled: false
imagePullSecrets:
# - name: "image-pull-secret"
cspcOperator:
componentName: cspc-operator
poolManager:
image:
registry:
repository: openebs/cstor-pool-manager
tag: 3.4.0
cstorPool:
image:
registry:
repository: openebs/cstor-pool
tag: 3.4.0
cstorPoolExporter:
image:
registry:
repository: openebs/m-exporter
tag: 3.4.0
image:
# Make sure that registry name end with a '/'.
# For example : quay.io/ is a correct value here and quay.io is incorrect
registry:
repository: openebs/cspc-operator
pullPolicy: IfNotPresent
# Overrides the image tag whose default is the chart appVersion.
tag: 3.4.0
annotations: {}
resyncInterval: "30"
podAnnotations: {}
podLabels: {}
nodeSelector: {}
tolerations: []
resources: {}
securityContext: {}
baseDir: "/var/openebs"
sparseDir: "/var/openebs/sparse"
cvcOperator:
componentName: cvc-operator
target:
image:
registry:
repository: openebs/cstor-istgt
tag: 3.4.0
volumeMgmt:
image:
registry:
repository: openebs/cstor-volume-manager
tag: 3.4.0
volumeExporter:
image:
registry:
repository: openebs/m-exporter
tag: 3.4.0
image:
# Make sure that registry name end with a '/'.
# For example : quay.io/ is a correct value here and quay.io is incorrect
registry:
repository: openebs/cvc-operator
pullPolicy: IfNotPresent
# Overrides the image tag whose default is the chart appVersion.
tag: 3.4.0
annotations: {}
resyncInterval: "30"
podAnnotations: {}
podLabels: {}
nodeSelector: {}
tolerations: []
resources: {}
securityContext: {}
baseDir: "/var/openebs"
logLevel: "2"
csiController:
priorityClass:
create: true
name: cstor-csi-controller-critical
value: 900000000
componentName: "openebs-cstor-csi-controller"
logLevel: "5"
resizer:
name: "csi-resizer"
image:
# Make sure that registry name end with a '/'.
# For example : quay.io/ is a correct value here and quay.io is incorrect
registry: k8s.gcr.io/
repository: sig-storage/csi-resizer
pullPolicy: IfNotPresent
# Overrides the image tag whose default is the chart appVersion.
tag: v1.4.0
snapshotter:
name: "csi-snapshotter"
image:
# Make sure that registry name end with a '/'.
# For example : quay.io/ is a correct value here and quay.io is incorrect
registry: k8s.gcr.io/
repository: sig-storage/csi-snapshotter
pullPolicy: IfNotPresent
# Overrides the image tag whose default is the chart appVersion.
tag: v5.0.1
snapshotController:
name: "snapshot-controller"
image:
# Make sure that registry name end with a '/'.
# For example : quay.io/ is a correct value here and quay.io is incorrect
registry: k8s.gcr.io/
repository: sig-storage/snapshot-controller
pullPolicy: IfNotPresent
# Overrides the image tag whose default is the chart appVersion.
tag: v5.0.1
attacher:
name: "csi-attacher"
image:
# Make sure that registry name end with a '/'.
# For example : quay.io/ is a correct value here and quay.io is incorrect
registry: k8s.gcr.io/
repository: sig-storage/csi-attacher
pullPolicy: IfNotPresent
# Overrides the image tag whose default is the chart appVersion.
tag: v3.4.0
provisioner:
name: "csi-provisioner"
image:
# Make sure that registry name end with a '/'.
# For example : quay.io/ is a correct value here and quay.io is incorrect
registry: k8s.gcr.io/
repository: sig-storage/csi-provisioner
pullPolicy: IfNotPresent
# Overrides the image tag whose default is the chart appVersion.
tag: v3.1.0
annotations: {}
podAnnotations: {}
podLabels: {}
nodeSelector: {}
tolerations: []
resources: {}
securityContext: {}
cstorCSIPlugin:
name: cstor-csi-plugin
image:
# Make sure that registry name end with a '/'.
# For example : quay.io/ is a correct value here and quay.io is incorrect
registry:
repository: openebs/cstor-csi-driver
pullPolicy: IfNotPresent
# Overrides the image tag whose default is the chart appVersion.
tag: 3.4.0
remount: "true"
csiNode:
priorityClass:
create: true
name: cstor-csi-node-critical
value: 900001000
componentName: "openebs-cstor-csi-node"
driverRegistrar:
name: "csi-node-driver-registrar"
image:
registry: k8s.gcr.io/
repository: sig-storage/csi-node-driver-registrar
pullPolicy: IfNotPresent
# Overrides the image tag whose default is the chart appVersion.
tag: v2.5.0
logLevel: "5"
updateStrategy:
type: RollingUpdate
annotations: {}
podAnnotations: {}
resources: {}
# limits:
# cpu: 10m
# memory: 32Mi
# requests:
# cpu: 10m
# memory: 32Mi
## Labels to be added to openebs-cstor-csi-node pods
podLabels: {}
# kubeletDir path can be configured to run on various different k8s distributions like
# microk8s where kubelet root dir is not (/var/lib/kubelet/). For example microk8s,
# we need to change the kubelet directory to `/var/snap/microk8s/common/var/lib/kubelet/`
kubeletDir: "/var/lib/kubelet/"
nodeSelector: {}
tolerations: []
securityContext: {}
csiDriver:
create: true
podInfoOnMount: true
attachRequired: false
admissionServer:
componentName: cstor-admission-webhook
image:
# Make sure that registry name end with a '/'.
# For example : quay.io/ is a correct value here and quay.io is incorrect
registry:
repository: openebs/cstor-webhook
pullPolicy: IfNotPresent
# Overrides the image tag whose default is the chart appVersion.
tag: 3.4.0
failurePolicy: "Fail"
annotations: {}
podAnnotations: {}
podLabels: {}
nodeSelector: {}
tolerations: []
resources: {}
securityContext: {}
serviceAccount:
# Annotations to add to the service account
annotations: {}
cstorOperator:
create: true
name: openebs-cstor-operator
csiController:
# Specifies whether a service account should be created
create: true
name: openebs-cstor-csi-controller-sa
csiNode:
# Specifies whether a service account should be created
create: true
name: openebs-cstor-csi-node-sa
analytics:
enabled: true
# Specify in hours the duration after which a ping event needs to be sent.
pingInterval: "24h"
cleanup:
image:
# Make sure that registry name end with a '/'.
# For example : quay.io/ is a correct value here and quay.io is incorrect
registry:
repository: bitnami/kubectl
tag:
Warning SnapshotFinalizerError 12s (x5 over 27s) snapshot-controller Failed to check and update snapshot: snapshot controller failed to update npm/snapshot-npm-data on API server: volumesnapshots.snapshot.storage.k8s.io "snapshot-npm-data" is forbidden: User "system:serviceaccount:openebs:openebs-cstor-csi-controller-sa" cannot patch resource "volumesnapshots" in API group "snapshot.storage.k8s.io" in the namespace "npm"
kubectl describe volumesnapshot -A
Name: snapshot-npm-data
Namespace: npm
Labels: <none>
Annotations: <none>
API Version: snapshot.storage.k8s.io/v1
Kind: VolumeSnapshot
Metadata:
Creation Timestamp: 2022-11-16T03:49:56Z
Finalizers:
snapshot.storage.kubernetes.io/volumesnapshot-as-source-protection
Generation: 1
Managed Fields:
API Version: snapshot.storage.k8s.io/v1beta1
Fields Type: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.:
f:kubectl.kubernetes.io/last-applied-configuration:
f:spec:
.:
f:source:
.:
f:persistentVolumeClaimName:
f:volumeSnapshotClassName:
Manager: kubectl-client-side-apply
Operation: Update
Time: 2022-11-16T03:49:56Z
API Version: snapshot.storage.k8s.io/v1
Fields Type: FieldsV1
fieldsV1:
f:metadata:
f:finalizers:
.:
v:"snapshot.storage.kubernetes.io/volumesnapshot-as-source-protection":
Manager: snapshot-controller
Operation: Update
Time: 2022-11-16T03:49:56Z
API Version: snapshot.storage.k8s.io/v1
Fields Type: FieldsV1
fieldsV1:
f:status:
.:
f:boundVolumeSnapshotContentName:
f:readyToUse:
Manager: snapshot-controller
Operation: Update
Subresource: status
Time: 2022-11-16T03:49:56Z
Resource Version: 3336
UID: 83447d06-45f1-494e-9e67-97d12b61b702
Spec:
Source:
Persistent Volume Claim Name: npm-data
Volume Snapshot Class Name: csi-cstor-snapshotclass
Status:
Bound Volume Snapshot Content Name: snapcontent-83447d06-45f1-494e-9e67-97d12b61b702
Ready To Use: false
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal CreatingSnapshot 4m59s snapshot-controller Waiting for a snapshot npm/snapshot-npm-data to be created by the CSI driver.
Warning SnapshotFinalizerError 44s (x9 over 4m59s) snapshot-controller Failed to check and update snapshot: snapshot controller failed to update npm/snapshot-npm-data on API server: volumesnapshots.snapshot.storage.k8s.io "snapshot-npm-data" is forbidden: User "system:serviceaccount:openebs:openebs-cstor-csi-controller-sa" cannot patch resource "volumesnapshots" in API group "snapshot.storage.k8s.io" in the namespace "npm"
kubectl describe sa openebs-cstor-csi-controller-sa -n openebs
Name: openebs-cstor-csi-controller-sa
Namespace: openebs
Labels: app.kubernetes.io/managed-by=Helm
chart=cstor-3.3.0
component=openebs-cstor-csi-controller
heritage=Helm
name=openebs-cstor-csi-controller
openebs.io/component-name=openebs-cstor-csi-controller
openebs.io/version=3.4.0
release=openebs-cstor
Annotations: meta.helm.sh/release-name: openebs-cstor
meta.helm.sh/release-namespace: openebs
Image pull secrets: <none>
Mountable secrets: <none>
Tokens: <none>
Events: <none>
kubectl get clusterrolebinding -o wide |grep controller-sa
openebs-cstor-csi-attacher-binding ClusterRole/openebs-cstor-csi-attacher-role 19m openebs/openebs-cstor-csi-controller-sa
openebs-cstor-csi-cluster-registrar-binding ClusterRole/openebs-cstor-csi-cluster-registrar-role 19m openebs/openebs-cstor-csi-controller-sa
openebs-cstor-csi-provisioner-binding ClusterRole/openebs-cstor-csi-provisioner-role 19m openebs/openebs-cstor-csi-controller-sa
openebs-cstor-csi-snapshotter-binding ClusterRole/openebs-cstor-csi-snapshotter-role 19m openebs/openebs-cstor-csi-controller-sa
The clone pvc doesn't get bound because the pv isn't provisioned, because the snapshot failed
OK some better news; I removed the helm completely and started fresh, this time not using the helm
I installed cstor-operator.yaml - however i replaced all versions with latest, this was fine until i tried to snapshot, said something about the controller-sa account Denied patching the volumesnapshot -
so i had to add the verb patch seen below
verbs: ["get", "list", "patch"] <------- ADDED PATCH verb
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: openebs-cstor-csi-provisioner-role
rules:
- apiGroups: [""]
resources: ["secrets","namespaces"]
verbs: ["get", "list"]
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources: ["persistentvolumes", "services"]
verbs: ["get", "list", "watch", "create", "delete", "update", "patch"]
- apiGroups: [""]
resources: ["persistentvolumeclaims"]
verbs: ["get", "list", "watch", "update"]
- apiGroups: [""]
resources: ["persistentvolumeclaims/status"]
verbs: ["update", "patch"]
- apiGroups: ["storage.k8s.io"]
resources: ["storageclasses", "csinodes"]
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources: ["events"]
verbs: ["list", "watch", "create", "update", "patch"]
- apiGroups: ["snapshot.storage.k8s.io"]
resources: ["volumesnapshots"]
verbs: ["get", "list"]
- apiGroups: ["storage.k8s.io"]
resources: ["volumeattachments"]
verbs: ["get", "list", "watch", "create", "delete", "update", "patch"]
- apiGroups: ["snapshot.storage.k8s.io"]
resources: ["volumesnapshotcontents"]
verbs: ["get", "list", "patch"] <------- ADDED PATCH verb
- apiGroups: ["coordination.k8s.io"]
resources: ["leases"]
verbs: ["*"]
- apiGroups: ["*"]
resources: ["cstorvolumeattachments", "cstorvolumes","cstorvolumeconfigs"]
verbs: ["*"]
Now the volume snapshots fine and pvc is created fine.
However still having issue with velero
time="2022-11-16T20:55:01Z" level=error msg="Error getting volume snapshotter for volume snapshot location" backup=velero/npm error="rpc error: code = Unknown desc = Error fetching OpenEBS rest client address" error.file="/home/travis/gopath/src/github.com/openebs/velero-plugin/pkg/cstor/cstor.go:192" error.function="github.com/openebs/velero-plugin/pkg/cstor.(*Plugin).Init" logSource="pkg/backup/item_backupper.go:470" name=pvc-aaa85da5-9bb0-46d0-81ca-dbb4b5eca05e namespace= persistentVolume=pvc-aaa85da5-9bb0-46d0-81ca-dbb4b5eca05e resource=persistentvolumes volumeSnapshotLocation=default
time="2022-11-16T20:55:01Z" level=error msg="Error getting volume snapshotter for volume snapshot location" backup=velero/npm error="rpc error: code = Unknown desc = Error fetching OpenEBS rest client address" error.file="/home/travis/gopath/src/github.com/openebs/velero-plugin/pkg/cstor/cstor.go:192" error.function="github.com/openebs/velero-plugin/pkg/cstor.(*Plugin).Init" logSource="pkg/backup/item_backupper.go:470" name=pvc-4a21fb27-6e74-4c73-9ca1-8b137c9d18f2 namespace= persistentVolume=pvc-4a21fb27-6e74-4c73-9ca1-8b137c9d18f2 resource=persistentvolumes volumeSnapshotLocation=default
OK GREAT NEWS!!
I used latest of everything and the plugins for velero too , everythings working !
velero install \
--provider aws \
--plugins velero/velero-plugin-for-aws:latest,openebs/velero-plugin:latest \
--bucket velero \
--secret-file ./credentials-velero \
--use-volume-snapshots=true \
--backup-location-config region=minio,s3ForcePathStyle="true",s3Url=http://X.X.X.X:9000
PVC are backed up !!! yeah the boys
For reference this it the cstor-operator.yaml i'm using - with the 'patch' verb included,
My updates version including everything that needs to get snapshotting working with latest versions of everything:
https://github.com/jadsy2107/k8s-ha-cluster/blob/main/cstor-operator.yaml
From the original: https://openebs.github.io/charts/cstor-operator.yaml
I'm deleted the whole cluster, and vms, starting a new cluster using my ansible-playbook
OK the plot thickens,
I started my new cluster, everything up and running fine, backup working fine, however restoring the volume im getting this error when describe the cvr
failed to sync CVR error: unable to update snapshot list details in CVR: failed to get the list of snapshots: Output: failed listsnap command for cstor-48ede97b-45c2-4418-8b49-d779a66948e4/pvc-55bb3bca-d99c-4cfc-9e18-5d07e6fe7891 with err 2
I'll keep digging !
OK!
We're done here, thanks for tuning in !
This was the final piece of the puzzle
autoSetTargetIP: "true"
As described:
https://github.com/openebs/velero-plugin#creating-a-restore-for-remote-backup
Thanks everyone, what an awesome product, now to building some awesome apps to live in this system
restoreAllIncrementalSnapshots: "true"
Is important also, as your restore will fail without this
I've set up a MiniO and its working correctly doing backups, just not the volumes - the pvc data isn't getting backed up
My snapshot location as example 06-volumesnapshotlocation.yaml So I applied the below to the cluster:
Then i try to take backup
Only the meta data is backup up - which is amazing, but i need the data too !
I installed openebs cstor from the helm with these values
kubectl get VolumeSnapshotClass -o yaml