Closed innotecsol closed 1 week ago
Hi @innotecsol, It seems like we have latest crd installed on the cluster. However, diskpool operator getting started is of the older version..
Can you share output for following commands.
kubect get crd diskpools.openebs.io -oyaml
kubectl get deploy openebs-operator-diskpool -n mayastor -oyaml
Hi abhilashshetty04,
kubectl get crd diskpools.openebs.io -oyaml
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
creationTimestamp: "2024-02-27T22:42:06Z"
generation: 4
name: diskpools.openebs.io
resourceVersion: "188484267"
uid: 2abef297-28b5-4533-a6b3-f8354ab63fd8
spec:
conversion:
strategy: None
group: openebs.io
names:
kind: DiskPool
listKind: DiskPoolList
plural: diskpools
shortNames:
- dsp
singular: diskpool
scope: Namespaced
versions:
- additionalPrinterColumns:
- description: node the pool is on
jsonPath: .spec.node
name: node
type: string
- description: dsp cr state
jsonPath: .status.cr_state
name: state
type: string
- description: Control plane pool status
jsonPath: .status.pool_status
name: pool_status
type: string
- description: total bytes
format: int64
jsonPath: .status.capacity
name: capacity
type: integer
- description: used bytes
format: int64
jsonPath: .status.used
name: used
type: integer
- description: available bytes
format: int64
jsonPath: .status.available
name: available
type: integer
name: v1beta2
schema:
openAPIV3Schema:
description: Auto-generated derived type for DiskPoolSpec via `CustomResource`
properties:
spec:
description: The pool spec which contains the parameters we use when creating
the pool
properties:
disks:
description: The disk device the pool is located on
items:
type: string
type: array
node:
description: The node the pool is placed on
type: string
topology:
description: The topology for data placement.
nullable: true
properties:
labelled:
additionalProperties:
type: string
default: {}
description: Label for topology
type: object
type: object
required:
- disks
- node
type: object
status:
description: Status of the pool which is driven and changed by the controller
loop.
nullable: true
properties:
available:
description: Available number of bytes.
format: uint64
minimum: 0
type: integer
capacity:
description: Capacity as number of bytes.
format: uint64
minimum: 0
type: integer
cr_state:
default: Creating
description: PoolState represents operator specific states for DSP
CR.
enum:
- Creating
- Created
- Terminating
type: string
pool_status:
description: Pool status from respective control plane object.
enum:
- Unknown
- Online
- Degraded
- Faulted
nullable: true
type: string
used:
description: Used number of bytes.
format: uint64
minimum: 0
type: integer
required:
- available
- capacity
- used
type: object
required:
- spec
title: DiskPool
type: object
served: true
storage: true
subresources:
status: {}
status:
acceptedNames:
kind: DiskPool
listKind: DiskPoolList
plural: diskpools
shortNames:
- dsp
singular: diskpool
conditions:
- lastTransitionTime: "2024-02-27T22:42:06Z"
message: no conflicts found
reason: NoConflicts
status: "True"
type: NamesAccepted
- lastTransitionTime: "2024-02-27T22:42:06Z"
message: the initial names have been accepted
reason: InitialNamesAccepted
status: "True"
type: Established
storedVersions:
- v1beta2
kubectl get deploy openebs-operator-diskpool -n mayastor -oyaml
returns with
Error from server (NotFound): deployments.apps "openebs-operator-diskpool" not found
Thanks for your support!
Frank
Sorry, there exists a deploy with mayastor prefix
kubectl get deploy mayastor-operator-diskpool -n mayastor -oyaml
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "3"
meta.helm.sh/release-name: mayastor
meta.helm.sh/release-namespace: mayastor
creationTimestamp: "2024-02-27T22:40:26Z"
generation: 3
labels:
app: operator-diskpool
app.kubernetes.io/managed-by: Helm
openebs.io/release: mayastor
openebs.io/version: 2.5.0
name: mayastor-operator-diskpool
namespace: mayastor
resourceVersion: "188856021"
uid: cf6b06db-e142-4085-a2f1-c295143e68ec
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: operator-diskpool
openebs.io/release: mayastor
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
app: operator-diskpool
openebs.io/logging: "true"
openebs.io/release: mayastor
openebs.io/version: 2.5.0
spec:
containers:
- args:
- -e http://mayastor-api-rest:8081
- -nmayastor
- --request-timeout=5s
- --interval=30s
env:
- name: RUST_LOG
value: info
- name: MY_POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
image: docker.io/openebs/mayastor-operator-diskpool:v2.5.0
imagePullPolicy: IfNotPresent
name: operator-diskpool
resources:
limits:
cpu: 100m
memory: 32Mi
requests:
cpu: 50m
memory: 16Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
initContainers:
- command:
- sh
- -c
- trap "exit 1" TERM; until nc -vzw 5 mayastor-agent-core 50051; do date;
echo "Waiting for agent-core-grpc services..."; sleep 1; done;
image: busybox:latest
imagePullPolicy: Always
name: agent-core-grpc-probe
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
- command:
- sh
- -c
- trap "exit 1" TERM; until nc -vzw 5 mayastor-etcd 2379; do date; echo "Waiting
for etcd..."; sleep 1; done;
image: busybox:latest
imagePullPolicy: Always
name: etcd-probe
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
nodeSelector:
kubernetes.io/arch: amd64
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: mayastor-service-account
serviceAccountName: mayastor-service-account
terminationGracePeriodSeconds: 30
status:
conditions:
- lastTransitionTime: "2024-02-27T22:40:26Z"
lastUpdateTime: "2024-08-05T04:12:32Z"
message: ReplicaSet "mayastor-operator-diskpool-5cd48746c" has successfully progressed.
reason: NewReplicaSetAvailable
status: "True"
type: Progressing
- lastTransitionTime: "2024-08-05T11:26:10Z"
lastUpdateTime: "2024-08-05T11:26:10Z"
message: Deployment does not have minimum availability.
reason: MinimumReplicasUnavailable
status: "False"
type: Available
observedGeneration: 3
replicas: 1
unavailableReplicas: 1
updatedReplicas: 1
Hi @innotecsol , As suspected. v1beta2(latest crd spec) already exists and diskpool operator is running on older build docker.io/openebs/mayastor-operator-diskpool:v2.5.0
. We need to check the upgrade flow.
cc: @niladrih
@innotecsol -- How did you upgrade from 2.5.0 to 2.7.0? What were the steps that you followed?
Hi, many thanks for following up.
I downloaded mayastor kubectl plugin from https://github.com/openebs/mayastor/releases/download/v2.7.0/kubectl-mayastor-x86_64-linux-musl.tar.gz
and executed
kubectl mayastor upgrade --skip-single-replica-volume-validation -d
kubectl mayastor upgrade --skip-single-replica-volume-validation
Initially (2.5.0 version) I came from
helm install mayastor mayastor/mayastor -n mayastor --create-namespace --version 2.5.0 --set "loki-stack.loki.persistence.storageClassName=manual,etcd.persistence.storageClass=manual"
@innotecsol -- For versions 2.2.0-2.5.0 (both included), you'd have to add the set flag --set agents.core.rebuild.partial.enabled=false
with the upgrade command, i.e.,
kubectl mayastor upgrade --set 'agents.core.rebuild.partial.enabled=false' --skip-single-replica-volume-validation
Ref: https://openebs.io/docs/user-guides/upgrade#replicated-storage (these instructions are for the openebs/openebs helm chart, the instructions have to adapted for the mayastor/mayastor chart to some degree)
I'm going to try to see if your helm release is in a healthy state so that you can try again. Could you share the output of helm ls -n mayastor
?
Hi niladrih,
here the required output:
helm ls -n mayastor
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
mayastor mayastor 3 2024-08-05 04:12:08.24708259 +0000 UTC failed mayastor-2.5.0 2.5.0
The status of the package
helm status -n mayastor mayastor
NAME: mayastor
LAST DEPLOYED: Mon Aug 5 04:12:08 2024
NAMESPACE: mayastor
STATUS: failed
REVISION: 3
NOTES:
OpenEBS Mayastor has been installed. Check its status by running:
$ kubectl get pods -n mayastor
For more information or to view the documentation, visit our website at https://mayastor.gitbook.io/introduction/.
The pods:
kubectl get pods -n mayastor
NAME READY STATUS RESTARTS AGE
mayastor-agent-core-f998b65b4-t82mg 2/2 Running 0 36h
mayastor-agent-ha-node-2pvm9 1/1 Running 0 36h
mayastor-agent-ha-node-4ss75 1/1 Running 0 36h
mayastor-agent-ha-node-7759h 1/1 Running 0 36h
mayastor-agent-ha-node-hnx8j 1/1 Running 0 36h
mayastor-agent-ha-node-vhbp6 1/1 Running 0 36h
mayastor-agent-ha-node-vjpmr 1/1 Running 0 36h
mayastor-agent-ha-node-wl77k 1/1 Running 0 36h
mayastor-api-rest-7479f49d86-nbw9d 1/1 Running 0 36h
mayastor-csi-controller-59ff8dc57b-mhfxv 5/5 Running 0 36h
mayastor-csi-node-72cll 2/2 Running 0 36h
mayastor-csi-node-cwjmt 2/2 Running 0 36h
mayastor-csi-node-hxg4l 2/2 Running 0 36h
mayastor-csi-node-jpvhr 2/2 Running 0 36h
mayastor-csi-node-w2zwb 2/2 Running 0 36h
mayastor-csi-node-wcbzb 2/2 Running 0 36h
mayastor-csi-node-wvrdd 2/2 Running 0 36h
mayastor-etcd-0 1/1 Running 0 3d7h
mayastor-etcd-1 1/1 Running 1 (3d6h ago) 154d
mayastor-etcd-2 1/1 Running 0 36h
mayastor-io-engine-7fr9f 2/2 Running 0 3d7h
mayastor-io-engine-llnnb 2/2 Running 0 3d3h
mayastor-io-engine-rhlfc 2/2 Running 2 (3d6h ago) 154d
mayastor-localpv-provisioner-6fd649f5fb-n8xmp 1/1 Running 0 36h
mayastor-loki-0 1/1 Running 0 36h
mayastor-nats-0 3/3 Running 0 36h
mayastor-nats-1 3/3 Running 0 36h
mayastor-nats-2 3/3 Running 0 36h
mayastor-obs-callhome-8c89fdb97-pg9kr 2/2 Running 0 36h
mayastor-operator-diskpool-5cd48746c-46zwb 0/1 CrashLoopBackOff 434 (4m4s ago) 36h
mayastor-promtail-c9rzm 1/1 Running 0 36h
mayastor-promtail-m6j67 1/1 Running 0 36h
mayastor-promtail-mnhzj 1/1 Running 0 36h
mayastor-promtail-nxcgx 1/1 Running 0 36h
mayastor-promtail-szhhv 1/1 Running 0 36h
mayastor-promtail-wqpng 1/1 Running 0 36h
mayastor-promtail-x8qlv 1/1 Running 0 36h
Should i execute the command
kubectl mayastor upgrade --set 'agents.core.rebuild.partial.enabled=false' --skip-single-replica-volume-validation
do I need to do anything else beforehand e.g. kubectl mayastor delete
kubectl get jobs -n mayastor
NAME COMPLETIONS DURATION AGE
mayastor-upgrade-v2-7-0 0/1 37h 37h
kubectl mayastor get upgrade-status
No upgrade event present.
Thanks for your support! Frank
We ran into the same issue from a 2.5.1 upgrade to 2.7.0 using:
kubectl mayastor upgrade
This leaves the diskpool pod using version 2.5.1 and has the error message.
For me it looks like all the components are still on 2.5.0 except etcd, it uses docker.io/bitnami/etcd:3.5.6-debian-11-r10 image - here I am not sure, but the statefulset was definitly changed as I had some podaffinity added which was gone.
kubectl describe pod -n mayastor | grep 2.5.0 openebs.io/version=2.5.0 Image: docker.io/openebs/mayastor-agent-core:v2.5.0 Image: docker.io/openebs/mayastor-agent-ha-cluster:v2.5.0 openebs.io/version=2.5.0 Image: docker.io/openebs/mayastor-agent-ha-node:v2.5.0 openebs.io/version=2.5.0 Image: docker.io/openebs/mayastor-agent-ha-node:v2.5.0 openebs.io/version=2.5.0 Image: docker.io/openebs/mayastor-agent-ha-node:v2.5.0 openebs.io/version=2.5.0 Image: docker.io/openebs/mayastor-agent-ha-node:v2.5.0 openebs.io/version=2.5.0 Image: docker.io/openebs/mayastor-agent-ha-node:v2.5.0 openebs.io/version=2.5.0 Image: docker.io/openebs/mayastor-agent-ha-node:v2.5.0 openebs.io/version=2.5.0 Image: docker.io/openebs/mayastor-agent-ha-node:v2.5.0 openebs.io/version=2.5.0 Image: docker.io/openebs/mayastor-api-rest:v2.5.0 openebs.io/version=2.5.0 Image: docker.io/openebs/mayastor-csi-controller:v2.5.0 openebs.io/version=2.5.0 Image: docker.io/openebs/mayastor-csi-node:v2.5.0 openebs.io/version=2.5.0 Image: docker.io/openebs/mayastor-csi-node:v2.5.0 openebs.io/version=2.5.0 Image: docker.io/openebs/mayastor-csi-node:v2.5.0 openebs.io/version=2.5.0 Image: docker.io/openebs/mayastor-csi-node:v2.5.0 openebs.io/version=2.5.0 Image: docker.io/openebs/mayastor-csi-node:v2.5.0 openebs.io/version=2.5.0 Image: docker.io/openebs/mayastor-csi-node:v2.5.0 openebs.io/version=2.5.0 Image: docker.io/openebs/mayastor-csi-node:v2.5.0 openebs.io/version=2.5.0 Image: docker.io/openebs/mayastor-metrics-exporter-io-engine:v2.5.0 Image: docker.io/openebs/mayastor-io-engine:v2.5.0 openebs.io/version=2.5.0 Image: docker.io/openebs/mayastor-metrics-exporter-io-engine:v2.5.0 Image: docker.io/openebs/mayastor-io-engine:v2.5.0 openebs.io/version=2.5.0 Image: docker.io/openebs/mayastor-metrics-exporter-io-engine:v2.5.0 Image: docker.io/openebs/mayastor-io-engine:v2.5.0 Image: grafana/loki:2.5.0 openebs.io/version=2.5.0 Image: docker.io/openebs/mayastor-obs-callhome:v2.5.0 Image: docker.io/openebs/mayastor-obs-callhome-stats:v2.5.0 openebs.io/version=2.5.0 Image: docker.io/openebs/mayastor-operator-diskpool:v2.5.0
Let's try these steps:
# Delete the upgrade-job
kubectl mayastor delete upgrade --force
# Try to roll back the helm release to a 'deployed' state
helm rollback mayastor -n mayastor
# Check if rollback succeeded
helm ls -n mayastor
If the STATUS says 'deployed', then proceed with the rest, otherwise share the output and any failure logs in the above commands.
# Upgrade command
kubectl mayastor upgrade --set 'agents.core.rebuild.partial.enabled=false' --skip-single-replica-volume-validation
# Monitor upgrade logs for any signs of failure. This could take a bit of time.
kubectl logs job/mayastor-upgrade-v2-7-0 -n mayastor -f
Proceed only if upgrade succeeded so far kubectl mayastor get upgrade-status
should say upgrade was successful. helm ls -n mayastor
should be on 2.7.0 and should be in 'deployed' state.
# Re-enable partial rebuild
# Ref: https://openebs.io/docs/user-guides/upgrade#replicated-storage, bullet 5, but adapted for the mayastor/mayastor chart
helm upgrade mayastor mayastor/mayastor -n mayastor --reuse-values --version 2.7.0 --set agents.core.rebuild.partial.enabled=true
The CRD issue should resolve itself by this time.
kubectl mayastor delete upgrade --force
Job mayastor-upgrade-v2-7-0 in namespace mayastor deleted
ConfigMap mayastor-upgrade-config-map-v2-7-0 in namespace mayastor deleted
ClusterRoleBinding mayastor-upgrade-role-binding-v2-7-0 in namespace mayastor deleted
ClusterRole mayastor-upgrade-role-v2-7-0 in namespace mayastor deleted
ServiceAccount mayastor-upgrade-service-account-v2-7-0 in namespace mayastor deleted
helm rollback mayastor -n mayastor
displayed some warnings that can be ignored
W0807 06:56:44.047144 68 warnings.go:70] would violate PodSecurity "restricted:latest": restricted volume types (volumes "containers", "pods" use restricted volume type "hostPath"), runAsNonRoot != true (pod or container "promtail" must set securityContext.runAsNonRoot=true), runAsUser=0 (pod must not set runAsUser=0), seccompProfile (pod or container "promtail" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
...
and
Rollback was a success! Happy Helming!
helm ls -n mayastor
mayastor mayastor 4 2024-08-07 06:56:42.752416746 +0200 CEST deployed mayastor-2.7.0 2.7.0
kubectl mayastor upgrade --set 'agents.core.rebuild.partial.enabled=false' --skip-single-replica-volume-validation
Volumes which make use of a single volume replica instance will be unavailable for some time during upgrade.
It is recommended that you do not create new volumes which make use of only one volume replica.
ServiceAccount: mayastor-upgrade-service-account-v2-7-0 created in namespace: mayastor
Cluster Role: mayastor-upgrade-role-v2-7-0 in namespace mayastor created
ClusterRoleBinding: mayastor-upgrade-role-binding-v2-7-0 in namespace mayastor created
ConfigMap: mayastor-upgrade-config-map-v2-7-0 in namespace mayastor created
Job: mayastor-upgrade-v2-7-0 created in namespace: mayastor
The upgrade has started. You can see the recent upgrade status using 'get upgrade-status` command.
However the ugprade runs into error
kubectl logs job/mayastor-upgrade-v2-7-0 -n mayastor -f
Application 'upgrade' revision d0a6618f4898 (v2.7.0+0)
2024-08-07T04:58:52.954824Z INFO upgrade_job: Validated all inputs
at k8s/upgrade/src/bin/upgrade-job/main.rs:64
2024-08-07T04:58:55.446963Z INFO upgrade_job::helm::upgrade: Skipping helm upgrade, as the version of the installed helm chart is the same as that of this upgrade-job's helm chart
at k8s/upgrade/src/bin/upgrade-job/helm/upgrade.rs:285
2024-08-07T04:58:55.462079Z ERROR upgrade_job::upgrade: Partial rebuild must be disabled for upgrades from mayastor chart versions >= 2.2.0, <= 2.5.0
at k8s/upgrade/src/bin/upgrade-job/upgrade.rs:182
2024-08-07T04:58:55.466020Z ERROR upgrade_job: Failed to upgrade Mayastor, error: Partial rebuild must be disabled for upgrades from mayastor chart versions >= 2.2.0, <= 2.5.0
at k8s/upgrade/src/bin/upgrade-job/main.rs:34
Error: PartialRebuildNotAllowed { chart_name: "mayastor", lower_extent: "2.2.0", upper_extent: "2.5.0" }
kubectl mayastor get upgrade-status
Upgrade From: 2.7.0
Upgrade To: 2.7.0
Upgrade Status: Upgraded Mayastor control-plane
kubectl get pod -n mayastor ...
mayastor mayastor-upgrade-v2-7-0-fvr4f 0/1 CrashLoopBackOff
It seems the pods were upgraded except for io engine Image: docker.io/openebs/mayastor-metrics-exporter-io-engine:v2.5.0 Image: docker.io/openebs/mayastor-io-engine:v2.5.0
all others run with images of 2.7.0
helm ls -n mayastor
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
mayastor mayastor 4 2024-08-07 06:56:42.752416746 +0200 CEST deployed mayastor-2.7.0 2.7.0
kubectl mayastor get upgrade-status
Upgrade From: 2.7.0
Upgrade To: 2.7.0
Upgrade Status: Upgraded Mayastor control-plane
the upgrade job does not run anymore
kubectl describe job mayastor-upgrade-v2-7-0 -n mayastor
Name: mayastor-upgrade-v2-7-0
Namespace: mayastor
Selector: batch.kubernetes.io/controller-uid=86c5daf4-c783-4e46-9fa2-31493f697cbf
Labels: app=upgrade
openebs.io/logging=true
Annotations: <none>
Parallelism: 1
Completions: 1
Completion Mode: NonIndexed
Start Time: Wed, 07 Aug 2024 06:58:25 +0200
Pods Statuses: 0 Active (1 Ready) / 0 Succeeded / 1 Failed
Pod Template:
Labels: app=upgrade
batch.kubernetes.io/controller-uid=86c5daf4-c783-4e46-9fa2-31493f697cbf
batch.kubernetes.io/job-name=mayastor-upgrade-v2-7-0
controller-uid=86c5daf4-c783-4e46-9fa2-31493f697cbf
job-name=mayastor-upgrade-v2-7-0
openebs.io/logging=true
Service Account: mayastor-upgrade-service-account-v2-7-0
Containers:
mayastor-upgrade-job:
Image: docker.io/openebs/mayastor-upgrade-job:v2.7.0
Port: <none>
Host Port: <none>
Args:
--rest-endpoint=http://mayastor-api-rest:8081
--namespace=mayastor
--release-name=mayastor
--helm-args-set=agents.core.rebuild.partial.enabled=false
--helm-args-set-file=
Liveness: exec [pgrep upgrade-job] delay=10s timeout=1s period=60s #success=1 #failure=3
Environment:
RUST_LOG: info
POD_NAME: (v1:metadata.name)
Mounts:
/upgrade-config-map from upgrade-config-map (ro)
Volumes:
upgrade-config-map:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: mayastor-upgrade-config-map-v2-7-0
Optional: false
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal MayastorUpgrade 25m mayastor-upgrade-v2-7-0 {"fromVersion":"2.7.0","toVersion":"2.7.0","message":"Starting Mayastor upgrade..."}
Normal MayastorUpgrade 26m mayastor-upgrade-v2-7-0 {"fromVersion":"2.7.0","toVersion":"2.7.0","message":"Starting Mayastor upgrade..."}
Normal MayastorUpgrade 25m mayastor-upgrade-v2-7-0 {"fromVersion":"2.7.0","toVersion":"2.7.0","message":"Upgraded Mayastor control-plane"}
Normal MayastorUpgrade 25m mayastor-upgrade-v2-7-0 {"fromVersion":"2.7.0","toVersion":"2.7.0","message":"Upgrading Mayastor control-plane"}
Normal MayastorUpgrade 26m mayastor-upgrade-v2-7-0 {"fromVersion":"2.7.0","toVersion":"2.7.0","message":"Upgrading Mayastor control-plane"}
Normal MayastorUpgrade 25m mayastor-upgrade-v2-7-0 {"fromVersion":"2.7.0","toVersion":"2.7.0","message":"Starting Mayastor upgrade..."}
Normal MayastorUpgrade 24m mayastor-upgrade-v2-7-0 {"fromVersion":"2.7.0","toVersion":"2.7.0","message":"Starting Mayastor upgrade..."}
Normal MayastorUpgrade 26m mayastor-upgrade-v2-7-0 {"fromVersion":"2.7.0","toVersion":"2.7.0","message":"Upgrading Mayastor control-plane"}
Normal MayastorUpgrade 20m mayastor-upgrade-v2-7-0 {"fromVersion":"2.7.0","toVersion":"2.7.0","message":"Starting Mayastor upgrade..."}
Normal MayastorUpgrade 24m mayastor-upgrade-v2-7-0 {"fromVersion":"2.7.0","toVersion":"2.7.0","message":"Upgraded Mayastor control-plane"}
Normal MayastorUpgrade 25m mayastor-upgrade-v2-7-0 {"fromVersion":"2.7.0","toVersion":"2.7.0","message":"Upgraded Mayastor control-plane"}
Normal MayastorUpgrade 22m mayastor-upgrade-v2-7-0 {"fromVersion":"2.7.0","toVersion":"2.7.0","message":"Upgrading Mayastor control-plane"}
Normal MayastorUpgrade 25m mayastor-upgrade-v2-7-0 {"fromVersion":"2.7.0","toVersion":"2.7.0","message":"Upgrading Mayastor control-plane"}
Normal MayastorUpgrade 22m mayastor-upgrade-v2-7-0 {"fromVersion":"2.7.0","toVersion":"2.7.0","message":"Starting Mayastor upgrade..."}
Normal MayastorUpgrade 24m mayastor-upgrade-v2-7-0 {"fromVersion":"2.7.0","toVersion":"2.7.0","message":"Upgrading Mayastor control-plane"}
Normal MayastorUpgrade 26m mayastor-upgrade-v2-7-0 {"fromVersion":"2.7.0","toVersion":"2.7.0","message":"Upgraded Mayastor control-plane"}
Normal MayastorUpgrade 20m mayastor-upgrade-v2-7-0 {"fromVersion":"2.7.0","toVersion":"2.7.0","message":"Upgraded Mayastor control-plane"}
Normal MayastorUpgrade 26m mayastor-upgrade-v2-7-0 {"fromVersion":"2.7.0","toVersion":"2.7.0","message":"Upgraded Mayastor control-plane"}
Normal MayastorUpgrade 20m mayastor-upgrade-v2-7-0 {"fromVersion":"2.7.0","toVersion":"2.7.0","message":"Upgrading Mayastor control-plane"}
Normal MayastorUpgrade 22m mayastor-upgrade-v2-7-0 {"fromVersion":"2.7.0","toVersion":"2.7.0","message":"Upgraded Mayastor control-plane"}
Normal MayastorUpgrade 26m mayastor-upgrade-v2-7-0 {"fromVersion":"2.7.0","toVersion":"2.7.0","message":"Starting Mayastor upgrade..."}
Normal SuccessfulCreate 26m job-controller Created pod: mayastor-upgrade-v2-7-0-fvr4f
Normal SuccessfulDelete 20m job-controller Deleted pod: mayastor-upgrade-v2-7-0-fvr4f
Warning BackoffLimitExceeded 20m job-controller Job has reached the specified backoff limit
Hi,
I ve redone the steps above. This time
helm rollback mayastor -n mayastor
resulted in
helm ls -n mayastor
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
mayastor mayastor 5 2024-08-09 12:57:02.455418537 +0200 CEST deployed mayastor-2.5.0 2.5.0
I have redone the upgrade
kubectl mayastor upgrade --set 'agents.core.rebuild.partial.enabled=false' --skip-single-replica-volume-validation
This time the upgrade went through successfully
Upgrade From: 2.5.0
Upgrade To: 2.7.0
Upgrade Status: Successfully upgraded Mayastor
I have attached the upgrade log. mayastor-2.7.0-upgrade.log
However not all of my replicas come back up.
kubectl mayastor get volumes
ID REPLICAS TARGET-NODE ACCESSIBILITY STATUS SIZE THIN-PROVISIONED ALLOCATED SNAPSHOTS SOURCE
1107276f-ce8e-4dfd-b2aa-feeaaed7843b 3 adm-cp0 nvmf Degraded 40GiB false 40GiB 0 <none>
18982155-f6cb-45ed-8eff-1acf8533af8a 3 adm-cp0 nvmf Degraded 4.7GiB false 4.7GiB 0 <none>
262be87d-5dab-4f7a-bc7c-129f0998c8c0 1 adm-cp1 nvmf Online 953.7MiB false 956MiB 0 <none>
2d2ef07e-a923-4a69-8c85-fd7ffc01b4a4 1 adm-cp2 nvmf Online 572.2MiB false 576MiB 0 <none>
3ce72d0c-7a52-471a-bf79-3bfcd445f7f3 3 adm-cp0 nvmf Degraded 15GiB false 15GiB 0 <none>
3fdb6324-6fbd-4d5a-bbde-aa155310b178 3 adm-cp0 nvmf Degraded 1GiB false 1GiB 0 <none>
51391ebb-f216-4649-8103-a829f7e72970 3 adm-cp1 nvmf Degraded 500MiB false 500MiB 0 <none>
539ec662-a32f-4487-b374-42ab6976856e 1 adm-cp1 nvmf Unknown 4.7GiB false 4.7GiB 0 <none>
6c7a3fee-202d-4635-a19a-e4960e50c4c5 3 adm-cp1 nvmf Online 22.9MiB false 24MiB 0 <none>
79d00161-60d8-4193-9c07-49dded99b11f 3 adm-cp1 nvmf Degraded 10GiB false 10GiB 0 <none>
7bdb0320-1e0c-49c4-bf7b-350aeec9ebb9 1 adm-cp2 nvmf Unknown 4.7GiB false 4.7GiB 0 <none>
8302be44-1172-45ea-abfd-07fa9e84069c 3 adm-cp0 nvmf Degraded 14GiB false 14GiB 0 <none>
bc9ce058-6706-4cb8-aad3-ac5111fbd2bf 1 adm-cp1 nvmf Online 953.7MiB false 956MiB 0 <none>
bcbacc92-ee18-4604-9ec1-ce7c36fae822 3 adm-cp1 nvmf Degraded 1GiB false 1GiB 0 <none>
bec5e48d-d9ac-4393-8f89-087c91298220 3 adm-cp1 nvmf Degraded 1GiB false 1GiB 0 <none>
d219f7d5-26c4-4abd-bd33-861bf520ae53 3 adm-cp0 nvmf Degraded 10GiB false 10GiB 0 <none>
e022de3f-7f12-4c36-8039-560a3292f2ab 1 adm-cp2 nvmf Online 37.3GiB false 37.3GiB 0 <none>
f6a67491-d00a-41e1-be3c-9a32cc73004c 3 adm-cp0 nvmf Degraded 30GiB false 30GiB 0 <none>
kubectl mayastor get volume-replica-topologies
VOLUME-ID ID NODE POOL STATUS CAPACITY ALLOCATED SNAPSHOTS CHILD-STATUS REASON REBUILD
1107276f-ce8e-4dfd-b2aa-feeaaed7843b 8162c4f6-ca64-48bc-afc6-4b833e30bfa6 adm-cp2 pool-adm-cp2 Online 40GiB 40GiB 0 B Online <none> <none>
└─ a707e5ec-c645-4900-aff6-c3df088435f5 adm-cp1 pool-adm-cp1 Online 40GiB 40GiB 0 B Online <none> <none>
18982155-f6cb-45ed-8eff-1acf8533af8a a0900850-087c-49b6-a8a0-2cca89b830c8 adm-cp2 pool-adm-cp2 Online 4.7GiB 4.7GiB 0 B Online <none> <none>
└─ 078ac54f-5537-4a04-8db8-f1824feef873 adm-cp1 pool-adm-cp1 Online 4.7GiB 4.7GiB 0 B Online <none> <none>
262be87d-5dab-4f7a-bc7c-129f0998c8c0 342f4b58-fdbe-4b5a-a384-b528de901776 adm-cp1 pool-adm-cp1 Online 956MiB 956MiB 0 B Online <none> <none>
2d2ef07e-a923-4a69-8c85-fd7ffc01b4a4 fdf60bfb-dc11-488f-af44-4acc1e408de8 adm-cp2 pool-adm-cp2 Online 576MiB 576MiB 0 B Online <none> <none>
3ce72d0c-7a52-471a-bf79-3bfcd445f7f3 a09028d3-a26d-422f-a6e8-bef15ed6eac3 adm-cp1 pool-adm-cp1 Online 15GiB 15GiB 0 B Online <none> <none>
└─ 7d2652f6-d83b-40c0-b929-7d7f0cd8d54a adm-cp2 pool-adm-cp2 Online 15GiB 15GiB 0 B Online <none> <none>
3fdb6324-6fbd-4d5a-bbde-aa155310b178 30ee2c95-3d74-4681-8c2d-9966996fa8ee adm-cp1 pool-adm-cp1 Online 1GiB 1GiB 0 B Online <none> <none>
└─ db0266a3-7000-4e59-b729-066cead5dfe8 adm-cp2 pool-adm-cp2 Online 1GiB 1GiB 0 B Online <none> <none>
51391ebb-f216-4649-8103-a829f7e72970 e5a01aec-b93d-4903-a529-11251cd4728e adm-cp1 pool-adm-cp1 Online 500MiB 500MiB 0 B Online <none> <none>
└─ af9a51b2-1a1d-4267-8406-1da51dcd26f4 adm-cp2 pool-adm-cp2 Online 500MiB 500MiB 0 B Online <none> <none>
539ec662-a32f-4487-b374-42ab6976856e 79e8d7a5-5652-40e4-a010-bd6776b4e142 adm-cp0 pool-adm-cp0 Online 4.7GiB 4.7GiB 0 B Online <none> <none>
6c7a3fee-202d-4635-a19a-e4960e50c4c5 dbb2988d-5622-4c82-b73f-e8813e4f62f9 adm-cp1 pool-adm-cp1 Online 24MiB 24MiB 0 B Online <none> <none>
├─ 1ec3e665-202c-481f-ad20-4d7bc18d4996 adm-cp0 pool-adm-cp0 Online 24MiB 24MiB 0 B Online <none> <none>
└─ 28c44bbf-19c6-40e9-aa61-1276f8a3a229 adm-cp2 pool-adm-cp2 Online 24MiB 24MiB 0 B Online <none> <none>
79d00161-60d8-4193-9c07-49dded99b11f 10fbe5de-1e7b-467b-8a91-0f825bf4ccdc adm-cp1 pool-adm-cp1 Online 10GiB 10GiB 0 B Online <none> <none>
└─ 180ea724-cbcb-4079-84b2-89510bf0b918 adm-cp2 pool-adm-cp2 Online 10GiB 10GiB 0 B Online <none> <none>
7bdb0320-1e0c-49c4-bf7b-350aeec9ebb9 731f0372-0662-43be-aa0c-a51e86cc727b adm-cp0 pool-adm-cp0 Online 4.7GiB 4.7GiB 0 B <none> <none> <none>
8302be44-1172-45ea-abfd-07fa9e84069c 521ae097-7fcc-4c04-876e-cb5bae1f827e adm-cp1 pool-adm-cp1 Online 14GiB 14GiB 0 B Online <none> <none>
└─ 32190d50-6eba-4a78-a47f-93464b4cd9ec adm-cp2 pool-adm-cp2 Online 14GiB 14GiB 0 B Online <none> <none>
bc9ce058-6706-4cb8-aad3-ac5111fbd2bf e82eacdf-a325-4ff5-920c-2f1c2780c05d adm-cp1 pool-adm-cp1 Online 956MiB 956MiB 0 B Online <none> <none>
bcbacc92-ee18-4604-9ec1-ce7c36fae822 87bfe420-93a5-4742-a33a-1d6fd572c372 adm-cp1 pool-adm-cp1 Online 1GiB 1GiB 0 B Online <none> <none>
└─ d4ad5aeb-d4b9-407d-8a7a-466158cf6b42 adm-cp2 pool-adm-cp2 Online 1GiB 1GiB 0 B Online <none> <none>
bec5e48d-d9ac-4393-8f89-087c91298220 677ec404-9da7-4c6c-9cd8-c07522981b13 adm-cp2 pool-adm-cp2 Online 1GiB 1GiB 0 B Online <none> <none>
└─ e2cfd45b-a96a-47f6-8431-6ee5805aa765 adm-cp1 pool-adm-cp1 Online 1GiB 1GiB 0 B Online <none> <none>
d219f7d5-26c4-4abd-bd33-861bf520ae53 8299a234-0355-4ef9-81dc-9d19fa3099b7 adm-cp2 pool-adm-cp2 Online 10GiB 10GiB 0 B Online <none> <none>
└─ df25bc9c-faea-4615-9d4d-29ac92afe8b7 adm-cp1 pool-adm-cp1 Online 10GiB 10GiB 0 B Online <none> <none>
e022de3f-7f12-4c36-8039-560a3292f2ab fd86344c-e40f-495e-a018-546dcff73318 adm-cp2 pool-adm-cp2 Online 37.3GiB 37.3GiB 0 B Online <none> <none>
f6a67491-d00a-41e1-be3c-9a32cc73004c 133c2b4b-4d90-4a81-b859-ebf979c5c13b adm-cp2 pool-adm-cp2 Online 30GiB 30GiB 0 B Online <none> <none>
└─ e4b299de-7988-40de-b2b5-7b81060b37b6 adm-cp1 pool-adm-cp1 Online 30GiB 30GiB 0 B Online <none> <none>
all nodes are up
kubectl mayastor get nodes
ID GRPC ENDPOINT STATUS VERSION
adm-cp0 192.168.4.5:10124 Online v2.7.0
adm-cp2 192.168.4.8:10124 Online v2.7.0
adm-cp1 192.168.4.6:10124 Online v2.7.0
all pools are online
kubectl mayastor get pools
ID DISKS MANAGED NODE STATUS CAPACITY ALLOCATED AVAILABLE COMMITTED
pool-adm-cp2 aio:///dev/sda?uuid=45e0b46f-c572-4359-bacf-b56def10d9a1 true adm-cp2 Online 476.5GiB 165GiB 311.5GiB 165GiB
pool-adm-cp0 aio:///dev/sda?uuid=dc8f7457-2d75-4367-b11d-2ae9d4cd673d true adm-cp0 Online 476.5GiB 136.5GiB 340GiB 9.3GiB
pool-adm-cp1 aio:///dev/sda?uuid=8f0217ae-9335-4169-88a1-48afa7ed3b42 true adm-cp1 Online 476.5GiB 129GiB 347.5GiB 129GiB
I have not enabled partial rebuild yet.
How do I get the volumes in a consistent state again?
Thanks & BR Frank
To re-enable partial rebuild:
helm upgrade mayastor mayastor/mayastor -n mayastor --reuse-values --version 2.7.0 --set agents.core.rebuild.partial.enabled=true
As for the volumes to be consistent, please mount them on a plan and they should rebuild back to the specified number of replicas.
What is the current state of your volumes?
Hi,
I am not sure about your answer. All the volumes are mounted and as displayed in my previous entry
kubectl mayastor get volumes
ID REPLICAS TARGET-NODE ACCESSIBILITY STATUS SIZE THIN-PROVISIONED ALLOCATED SNAPSHOTS SOURCE
1107276f-ce8e-4dfd-b2aa-feeaaed7843b 3 adm-cp0 nvmf Degraded 40GiB false 40GiB 0 <none>
18982155-f6cb-45ed-8eff-1acf8533af8a 3 adm-cp0 nvmf Degraded 4.7GiB false 4.7GiB 0 <none>
e.g. 1107276f-ce8e-4dfd-b2aa-feeaaed7843b says 3 replicas and status degraded
kubectl mayastor get volume-replica-topologies
VOLUME-ID ID NODE POOL STATUS CAPACITY ALLOCATED SNAPSHOTS CHILD-STATUS REASON REBUILD
1107276f-ce8e-4dfd-b2aa-feeaaed7843b 8162c4f6-ca64-48bc-afc6-4b833e30bfa6 adm-cp2 pool-adm-cp2 Online 40GiB 40GiB 0 B Online <none> <none>
└─ a707e5ec-c645-4900-aff6-c3df088435f5 adm-cp1 pool-adm-cp1 Online 40GiB 40GiB 0 B Online <none> <none>
189
It only shows two replicas for the volume. This is the status after a few days.
Do you say that it will resolve after enabling partial rebuild?
Thanks & BR Frank
Can you attach a support bundle?
Example:
kubectl mayastor dump system -n mayastor
mayastor-2024-08-12--18-54-41-UTC-partaa-of-tar.gz mayastor-2024-08-12--18-54-41-UTC-partab-of-tar.gz mayastor-2024-08-12--18-54-41-UTC-partac-of-tar.gz
I have uploaded the files. As the tar.gz is too big (48MB) I splitted it - you need to cat mayastor-2024-08-12--18-54-41-UTC-parta* >mayastor-2024-08-12--18-54-41-UTC.tar.gz
Thanks
Hmm
status: InvalidArgument, message: "errno: : out of metadata pages failed to create lvol
But it doesn't seem likely that we have actually ran out of metadata pages on the pool, given how few volumes you have. I suspect you have hit a variation of another bug. I can't find the ticket now but it was related to a race condition on the pool.
However, I see a lot of EIO errors on the device, might mean the pool disk /dev/sda is not working properly. Please check dmesg etc for any disk errors.
Otherwise we should reset the pool disk and re-create the pool anew, example:
First change replica count of `7bdb0320-1e0c-49c4-bf7b-350aeec9ebb9` to 2, so we can rebuild another replica on another node.
For this you can use `kubectl mayastor scale volume 7bdb0320-1e0c-49c4-bf7b-350aeec9ebb9 2`
Then we need to reset the pool.
unlabel io-engine from node cp0
Then zero out the pool disk, example: dd if=/dev/zero of=/dev/sda bs=16M status=progress`
Then relabel io-engine label for cp0
Then re-create the pool, by kubectl exec into io-engine container and creating the pool:
io-engine-client pool create pool-adm-cp0 /dev/disk/by-id/ata-SSD_512GB_202301030033
For the record, I had the same problem upgrading 2.5.0 to 2.7.0 and forgetting to disable partial rebuild and stuck with a partially upgraded install that helm didn't like. Following the instructions here fixed it:
kubectl mayastor upgrade --set agents.core.rebuild.partial.enabled=false
, (long) waithelm upgrade mayastor mayastor/mayastor -n mayastor --reuse-values --version 2.7.0 -f mayastor.yaml --set agents.core.rebuild.partial.enabled=true
It would be nice if the documentation were a bit clearer about the relationship between helm and mayastor upgrade, since the first time through it wasn't clear that mayastor upgrade effectively upgrades the chart, and you should not do helm upgrade normally.
I changed all the volumes replica count to 2 I unlabeled the io-engine from node adm-cp0 -> the io-engine termineted on the node I zeroed out the diskpool
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: disk-wipe
spec:
restartPolicy: Never
nodeName: adm-cp0
containers:
- name: disk-wipe
image: busybox
securityContext:
privileged: true
command: ["/bin/sh", "-c", "dd if=/dev/zero bs=1M count=100 oflag=direct of=/dev/sda"]
EOF
relabled the cp0 However the diskpool exists already
kubectl mayastor get pools
ID DISKS MANAGED NODE STATUS CAPACITY ALLOCATED AVAILABLE COMMITTED
pool-adm-cp2 aio:///dev/sda?uuid=45e0b46f-c572-4359-bacf-b56def10d9a1 true adm-cp2 Online 476.5GiB 165GiB 311.5GiB 165GiB
pool-adm-cp0 aio:///dev/sda?uuid=c86bd997-5a31-4861-857c-cf7f98e6a728 true adm-cp0 Online 476.5GiB 176.5GiB 300GiB 136.5GiB
pool-adm-cp1 aio:///dev/sda?uuid=8f0217ae-9335-4169-88a1-48afa7ed3b42 true adm-cp1 Online 476.5GiB 129GiB 347.5GiB 129GiB
thus the last step does not work
Then re-create the pool, by kubectl exec into io-engine container and creating the pool: io-engine-client pool create pool-adm-cp0 /dev/disk/by-id/ata-SSD_512GB_202301030033
and rescaling the volume to 3 returns with an error:
kubectl mayastor scale volume 1107276f-ce8e-4dfd-b2aa-feeaaed7843b 3
Failed to scale volume 1107276f-ce8e-4dfd-b2aa-feeaaed7843b. Error error in response: status code '400 Bad Request', content: 'RestJsonError { details: "create_replica::status: InvalidArgument, message: \"errno: failed to create lvol e345bf41-6d85-4a60-9973-cb3fd42c379b\", details: [], metadata: MetadataMap { headers: {\"content-type\": \"application/grpc\", \"date\": \"Wed, 21 Aug 2024 08:29:03 GMT\", \"content-length\": \"0\"} }", message: "SvcError::GrpcRequestError", kind: InvalidArgument }'
I removed the disk-pool via the yaml file kubectl delete -f ....
However the diskpool is now in terminating state
Name: pool-adm-cp0
Namespace: mayastor
Labels:
How do I get the finalizer cleaned up
Finalizers:
openebs.io/diskpool-protection
I got the diskpool removed - there was still a single replica pv hanging on a node. By rebooting the node the diskpool was removed. I recreated the diskpool by applying the yaml. Alls seems to be in a consistent state. However I can not scale the volume replica back to 3
kubectl mayastor scale volume 1107276f-ce8e-4dfd-b2aa-feeaaed7843b 3
Failed to scale volume 1107276f-ce8e-4dfd-b2aa-feeaaed7843b. Error error in response: status code '400 Bad Request', content: 'RestJsonError { details: "create_replica::status: InvalidArgument, message: \"errno: failed to create lvol 12bff7df-bd62-472e-b84e-69435391cc35\", details: [], metadata: MetadataMap { headers: {\"content-type\": \"application/grpc\", \"date\": \"Wed, 21 Aug 2024 12:12:13 GMT\", \"content-length\": \"0\"} }", message: "SvcError::GrpcRequestError", kind: InvalidArgument }'****
@innotecsol , Can you please send us the latest support bundle. Preferably after retrying the same operation
kubectl mayastor dump system -n mayastor
mayastor-2024-08-22--06-51-37-UTC.tar.gz Please find attached
@innotecsol, Scale up operation involves a new replica creation. pool : "pool-adm-cp0" on admin-cp0 node was picked. CreateReplicaRequest has failed due to crc metadata mismatch.
Logs:
2024-08-22T06:51:31.151006078Z stdout F [2024-08-22T06:51:31.150837027+00:00 INFO io_engine::grpc::v1::replica:replica.rs:368] CreateReplicaRequest { name: "0927c8d4-4fc8-4629-af5c-e70cbda20833", uuid: "0927c8d4-4fc8-4629-af5c-e70cbda20833", pooluuid: "pool-adm-cp0", size: 42949672960, thin: false, share: None, allowed_hosts: [], entity_id: Some("1107276f-ce8e-4dfd-b2aa-feeaaed7843b") }
2024-08-22T06:51:31.152417626Z stdout F [2024-08-22T06:51:31.152329178+00:00 ERROR mayastor::spdk:blobstore.c:1659] Metadata page 236 crc mismatch for blobid 0x1000000ec
All the scale operation that failed were attempted on this node only. Hence, crc mismatch error is seen only on pool-adm-cp0 pool.
Lets try to manually create a Replica on the affected pool and on the non affected just to see if its a device issue..
Do the following exec on the pod running on admin-cp0 node,
kubectl exec -it openebs-io-engine-xxxx -n <namespace> -c io-engine -- sh
./bin/io-engine-client -b <io-engine-pod-ip-we-exec'd> replica create --size 2724835328 new 1107276f-ce8e-4dfd-b2aa-feeaaed78234 pool-adm-cp0
This creates replica on pool-adm-cp0. Lets verify if replica created successfully using
./bin/io-engine-client -b -b <io-engine-pod-ip-we-exec'd> replica list
Lets do the same operation on admin-cp1 node. exec on io-engine pod running on admin-cp1
kubectl exec -it openebs-io-engine-xxxx -n
./bin/io-engine-client -b <io-engine-pod-ip-we-exec'd> replica create --size 2724835328 new-1 1107276f-ce8e-4dfd-b2aa-feeaaed78233 pool-adm-cp1
./bin/io-engine-client -b <io-engine-pod-ip-we-exec'd> replica list
Thoughts: @dsharma-dc , @dsavitskiy , @tiagolobocastro
adm-cp0 fails:
kubectl exec -it -n mayastor mayastor-io-engine-qkk95 -c io-engine -- /bin/sh
io-engine-client replica create --size 2724835328 new 1107276f-ce8e-4dfd-b2aa-feeaaed78234 pool-adm-cp0
Error: GrpcStatus { source: Status { code: InvalidArgument, message: "errno: failed to create lvol new", metadata: MetadataMap { headers: {"content-type": "application/grpc", "date": "Thu, 22 Aug 2024 11:31:00 GMT", "content-length": "0"} }, source: None }, backtrace: Backtrace(()) }
adm-cp1 works:
kubectl exec -it -n mayastor mayastor-io-engine-42nxm -c io-engine -- /bin/sh
# io-engine-client replica create --size 2724835328 new-1 1107276f-ce8e-4dfd-b2aa-feeaaed78233 pool-adm-cp1
bdev:///new-1?uuid=1107276f-ce8e-4dfd-b2aa-feeaaed78233
replica list:
pool-adm-cp1 new-1 1107276f-ce8e-4dfd-b2aa-feeaaed78233 false none 2726297600 2726297600 2726297600 bdev:///new-1?uuid=1107276f-ce8e-4dfd-b2aa-feeaaed78233 false false 0 0
@innotecsol , Seems like replica_create issue is specific to this node/pool.
You have hit this before..
status: InvalidArgument, message: "errno: : out of metadata pages failed to create lvol
Do you have any replicas on the pool?
io-engine-client pool list
io-engine-client replica list
If no, Can we delete the diskpool and recreate it using same spec. Seems like we skipped pool deletion before??
kubectl exec -it -n mayastor mayastor-io-engine-qkk95 -c io-engine -- /bin/sh
/# io-engine-client pool list
NAME UUID STATE CAPACITY USED DISKS
pool-adm-cp0 9512c173-b939-4a3c-85d7-51ee736c0d0e online 511604424704 495577989120 aio:///dev/sda?uuid=ab757fda-aa6a-4f5f-ba30-691f5e6ad467
/ # io-engine-client replica list
POOL NAME UUID THIN SHARE SIZE CAP ALLOC URI IS_SNAPSHOT IS_CLONE SNAP_ANCESTOR_SIZE CLONE_SNAP_ANCESTOR_SIZE
pool-adm-cp0 af895cf7-a379-4f57-aff7-10d2332abe5f af895cf7-a379-4f57-aff7-10d2332abe5f false nvmf 42949672960 42949672960 42949672960 nvmf://192.168.4.5:8420/nqn.2019-05.io.openebs:af895cf7-a379-4f57-aff7-10d2332abe5f?uuid=af895cf7-a379-4f57-aff7-10d2332abe5f false false 0 0
pool-adm-cp0 ae8f64b3-aea6-4e6c-99ed-4af87df96d6d ae8f64b3-aea6-4e6c-99ed-4af87df96d6d false nvmf 5003804672 5003804672 5003804672 nvmf://192.168.4.5:8420/nqn.2019-05.io.openebs:ae8f64b3-aea6-4e6c-99ed-4af87df96d6d?uuid=ae8f64b3-aea6-4e6c-99ed-4af87df96d6d false false 0 0
pool-adm-cp0 a468f327-db67-4856-abe5-2a1d2615419b a468f327-db67-4856-abe5-2a1d2615419b false nvmf 16106127360 16106127360 16106127360 nvmf://192.168.4.5:8420/nqn.2019-05.io.openebs:a468f327-db67-4856-abe5-2a1d2615419b?uuid=a468f327-db67-4856-abe5-2a1d2615419b false false 0 0
pool-adm-cp0 e54e3928-daa1-4079-aa81-8e6bd34208e9 e54e3928-daa1-4079-aa81-8e6bd34208e9 false nvmf 1073741824 1073741824 1073741824 nvmf://192.168.4.5:8420/nqn.2019-05.io.openebs:e54e3928-daa1-4079-aa81-8e6bd34208e9?uuid=e54e3928-daa1-4079-aa81-8e6bd34208e9 false false 0 0
pool-adm-cp0 8b1ebf2e-7ed1-414b-97a4-ca2c8e4f5f00 8b1ebf2e-7ed1-414b-97a4-ca2c8e4f5f00 false nvmf 524288000 524288000 524288000 nvmf://192.168.4.5:8420/nqn.2019-05.io.openebs:8b1ebf2e-7ed1-414b-97a4-ca2c8e4f5f00?uuid=8b1ebf2e-7ed1-414b-97a4-ca2c8e4f5f00 false false 0 0
pool-adm-cp0 1ec3e665-202c-481f-ad20-4d7bc18d4996 1ec3e665-202c-481f-ad20-4d7bc18d4996 false nvmf 25165824 25165824 25165824 nvmf://192.168.4.5:8420/nqn.2019-05.io.openebs:1ec3e665-202c-481f-ad20-4d7bc18d4996?uuid=1ec3e665-202c-481f-ad20-4d7bc18d4996 false false 0 0
pool-adm-cp0 997ae19b-9c05-4086-b872-155ebd4a5c07 997ae19b-9c05-4086-b872-155ebd4a5c07 false nvmf 10737418240 10737418240 10737418240 nvmf://192.168.4.5:8420/nqn.2019-05.io.openebs:997ae19b-9c05-4086-b872-155ebd4a5c07?uuid=997ae19b-9c05-4086-b872-155ebd4a5c07 false false 0 0
pool-adm-cp0 aff5d6f2-02a2-4ce2-a175-afc7a75b16d9 aff5d6f2-02a2-4ce2-a175-afc7a75b16d9 false nvmf 15003025408 15003025408 15003025408 nvmf://192.168.4.5:8420/nqn.2019-05.io.openebs:aff5d6f2-02a2-4ce2-a175-afc7a75b16d9?uuid=aff5d6f2-02a2-4ce2-a175-afc7a75b16d9 false false 0 0
pool-adm-cp0 d8cb6484-1492-45dd-983d-ec38826d8f52 d8cb6484-1492-45dd-983d-ec38826d8f52 false nvmf 1073741824 1073741824 1073741824 nvmf://192.168.4.5:8420/nqn.2019-05.io.openebs:d8cb6484-1492-45dd-983d-ec38826d8f52?uuid=d8cb6484-1492-45dd-983d-ec38826d8f52 false false 0 0
pool-adm-cp0 6cbb3242-7f8a-4912-acaf-971e2e4cc12c 6cbb3242-7f8a-4912-acaf-971e2e4cc12c false nvmf 1073741824 1073741824 1073741824 nvmf://192.168.4.5:8420/nqn.2019-05.io.openebs:6cbb3242-7f8a-4912-acaf-971e2e4cc12c?uuid=6cbb3242-7f8a-4912-acaf-971e2e4cc12c false false 0 0
pool-adm-cp0 79e8d7a5-5652-40e4-a010-bd6776b4e142 79e8d7a5-5652-40e4-a010-bd6776b4e142 false none 5003804672 5003804672 5003804672 bdev:///79e8d7a5-5652-40e4-a010-bd6776b4e142?uuid=79e8d7a5-5652-40e4-a010-bd6776b4e142 false false 0 0
pool-adm-cp0 7704f065-4902-4c4d-816b-8621aeb13525 7704f065-4902-4c4d-816b-8621aeb13525 false nvmf 10737418240 10737418240 10737418240 nvmf://192.168.4.5:8420/nqn.2019-05.io.openebs:7704f065-4902-4c4d-816b-8621aeb13525?uuid=7704f065-4902-4c4d-816b-8621aeb13525 false false 0 0
pool-adm-cp0 a8a5292e-fe2b-42a3-a631-7c2f2b1c4147 a8a5292e-fe2b-42a3-a631-7c2f2b1c4147 false nvmf 32212254720 32212254720 32212254720 nvmf://192.168.4.5:8420/nqn.2019-05.io.openebs:a8a5292e-fe2b-42a3-a631-7c2f2b1c4147?uuid=a8a5292e-fe2b-42a3-a631-7c2f2b1c4147 false false 0 0
pool-adm-cp0 731f0372-0662-43be-aa0c-a51e86cc727b 731f0372-0662-43be-aa0c-a51e86cc727b false none 5003804672 5003804672 5003804672 bdev:///731f0372-0662-43be-aa0c-a51e86cc727b?uuid=731f0372-0662-43be-aa0c-a51e86cc727b false false 0 0
kubectl mayastor get pools
ID DISKS MANAGED NODE STATUS CAPACITY ALLOCATED AVAILABLE COMMITTED
pool-adm-cp2 aio:///dev/sda?uuid=45e0b46f-c572-4359-bacf-b56def10d9a1 true adm-cp2 Online 476.5GiB 165GiB 311.5GiB 165GiB
pool-adm-cp0 aio:///dev/sda?uuid=ab757fda-aa6a-4f5f-ba30-691f5e6ad467 true adm-cp0 Online 476.5GiB 461.5GiB 14.9GiB 136.5GiB
pool-adm-cp1 aio:///dev/sda?uuid=8f0217ae-9335-4169-88a1-48afa7ed3b42 true adm-cp1 Online 476.5GiB 140.9GiB 335.6GiB 140.9GiB
kubectl get diskpools -A
NAMESPACE NAME NODE STATE POOL_STATUS CAPACITY USED AVAILABLE
mayastor pool-adm-cp0 adm-cp0 Created Online 511604424704 495577989120 16026435584
mayastor pool-adm-cp1 adm-cp1 Created Online 511604424704 151259185152 360345239552
mayastor pool-adm-cp2 adm-cp2 Created Online 511604424704 177125457920 334478966784
kubectl delete -f adm-cp0-mayastor.yaml
diskpool.openebs.io "pool-adm-cp0" deleted
where adm-cp0-mayastor.yaml
cat adm-cp0-mayastor.yaml
apiVersion: "openebs.io/v1beta2"
kind: DiskPool
metadata:
name: pool-adm-cp0
namespace: mayastor
spec:
node: adm-cp0
disks: ["/dev/sda"]
after deletion
kubectl get diskpools -A
NAMESPACE NAME NODE STATE POOL_STATUS CAPACITY USED AVAILABLE
mayastor pool-adm-cp1 adm-cp1 Created Online 511604424704 151259185152 360345239552
mayastor pool-adm-cp2 adm-cp2 Created Online 511604424704 177125457920 334478966784
io-engine-client pool list
No pools found
io-engine-client replica list
No replicas found
kubectl apply -f adm-cp0-mayastor.yaml
diskpool.openebs.io/pool-adm-cp0 created
io-engine-client pool list
NAME UUID STATE CAPACITY USED DISKS
pool-adm-cp0 9512c173-b939-4a3c-85d7-51ee736c0d0e online 511604424704 146528010240 aio:///dev/sda?uuid=d42db4cf-d2d4-4cae-b018-e06a31fd8060
io-engine-client pool list
NAME UUID STATE CAPACITY USED DISKS
pool-adm-cp0 9512c173-b939-4a3c-85d7-51ee736c0d0e online 511604424704 146528010240 aio:///dev/sda?uuid=d42db4cf-d2d4-4cae-b018-e06a31fd8060
/ # io-engine-client replica list
POOL NAME UUID THIN SHARE SIZE CAP ALLOC URI IS_SNAPSHOT IS_CLONE SNAP_ANCESTOR_SIZE CLONE_SNAP_ANCESTOR_SIZE
pool-adm-cp0 af895cf7-a379-4f57-aff7-10d2332abe5f af895cf7-a379-4f57-aff7-10d2332abe5f false nvmf 42949672960 42949672960 42949672960 nvmf://192.168.4.5:8420/nqn.2019-05.io.openebs:af895cf7-a379-4f57-aff7-10d2332abe5f?uuid=af895cf7-a379-4f57-aff7-10d2332abe5f false false 0 0
pool-adm-cp0 ae8f64b3-aea6-4e6c-99ed-4af87df96d6d ae8f64b3-aea6-4e6c-99ed-4af87df96d6d false nvmf 5003804672 5003804672 5003804672 nvmf://192.168.4.5:8420/nqn.2019-05.io.openebs:ae8f64b3-aea6-4e6c-99ed-4af87df96d6d?uuid=ae8f64b3-aea6-4e6c-99ed-4af87df96d6d false false 0 0
pool-adm-cp0 a468f327-db67-4856-abe5-2a1d2615419b a468f327-db67-4856-abe5-2a1d2615419b false nvmf 16106127360 16106127360 16106127360 nvmf://192.168.4.5:8420/nqn.2019-05.io.openebs:a468f327-db67-4856-abe5-2a1d2615419b?uuid=a468f327-db67-4856-abe5-2a1d2615419b false false 0 0
pool-adm-cp0 e54e3928-daa1-4079-aa81-8e6bd34208e9 e54e3928-daa1-4079-aa81-8e6bd34208e9 false nvmf 1073741824 1073741824 1073741824 nvmf://192.168.4.5:8420/nqn.2019-05.io.openebs:e54e3928-daa1-4079-aa81-8e6bd34208e9?uuid=e54e3928-daa1-4079-aa81-8e6bd34208e9 false false 0 0
pool-adm-cp0 8b1ebf2e-7ed1-414b-97a4-ca2c8e4f5f00 8b1ebf2e-7ed1-414b-97a4-ca2c8e4f5f00 false nvmf 524288000 524288000 524288000 nvmf://192.168.4.5:8420/nqn.2019-05.io.openebs:8b1ebf2e-7ed1-414b-97a4-ca2c8e4f5f00?uuid=8b1ebf2e-7ed1-414b-97a4-ca2c8e4f5f00 false false 0 0
pool-adm-cp0 1ec3e665-202c-481f-ad20-4d7bc18d4996 1ec3e665-202c-481f-ad20-4d7bc18d4996 false nvmf 25165824 25165824 25165824 nvmf://192.168.4.5:8420/nqn.2019-05.io.openebs:1ec3e665-202c-481f-ad20-4d7bc18d4996?uuid=1ec3e665-202c-481f-ad20-4d7bc18d4996 false false 0 0
pool-adm-cp0 997ae19b-9c05-4086-b872-155ebd4a5c07 997ae19b-9c05-4086-b872-155ebd4a5c07 false nvmf 10737418240 10737418240 10737418240 nvmf://192.168.4.5:8420/nqn.2019-05.io.openebs:997ae19b-9c05-4086-b872-155ebd4a5c07?uuid=997ae19b-9c05-4086-b872-155ebd4a5c07 false false 0 0
pool-adm-cp0 aff5d6f2-02a2-4ce2-a175-afc7a75b16d9 aff5d6f2-02a2-4ce2-a175-afc7a75b16d9 false nvmf 15003025408 15003025408 15003025408 nvmf://192.168.4.5:8420/nqn.2019-05.io.openebs:aff5d6f2-02a2-4ce2-a175-afc7a75b16d9?uuid=aff5d6f2-02a2-4ce2-a175-afc7a75b16d9 false false 0 0
pool-adm-cp0 d8cb6484-1492-45dd-983d-ec38826d8f52 d8cb6484-1492-45dd-983d-ec38826d8f52 false nvmf 1073741824 1073741824 1073741824 nvmf://192.168.4.5:8420/nqn.2019-05.io.openebs:d8cb6484-1492-45dd-983d-ec38826d8f52?uuid=d8cb6484-1492-45dd-983d-ec38826d8f52 false false 0 0
pool-adm-cp0 6cbb3242-7f8a-4912-acaf-971e2e4cc12c 6cbb3242-7f8a-4912-acaf-971e2e4cc12c false nvmf 1073741824 1073741824 1073741824 nvmf://192.168.4.5:8420/nqn.2019-05.io.openebs:6cbb3242-7f8a-4912-acaf-971e2e4cc12c?uuid=6cbb3242-7f8a-4912-acaf-971e2e4cc12c false false 0 0
pool-adm-cp0 79e8d7a5-5652-40e4-a010-bd6776b4e142 79e8d7a5-5652-40e4-a010-bd6776b4e142 false none 5003804672 5003804672 5003804672 bdev:///79e8d7a5-5652-40e4-a010-bd6776b4e142?uuid=79e8d7a5-5652-40e4-a010-bd6776b4e142 false false 0 0
pool-adm-cp0 7704f065-4902-4c4d-816b-8621aeb13525 7704f065-4902-4c4d-816b-8621aeb13525 false nvmf 10737418240 10737418240 10737418240 nvmf://192.168.4.5:8420/nqn.2019-05.io.openebs:7704f065-4902-4c4d-816b-8621aeb13525?uuid=7704f065-4902-4c4d-816b-8621aeb13525 false false 0 0
pool-adm-cp0 a8a5292e-fe2b-42a3-a631-7c2f2b1c4147 a8a5292e-fe2b-42a3-a631-7c2f2b1c4147 false nvmf 32212254720 32212254720 32212254720 nvmf://192.168.4.5:8420/nqn.2019-05.io.openebs:a8a5292e-fe2b-42a3-a631-7c2f2b1c4147?uuid=a8a5292e-fe2b-42a3-a631-7c2f2b1c4147 false false 0 0
pool-adm-cp0 731f0372-0662-43be-aa0c-a51e86cc727b 731f0372-0662-43be-aa0c-a51e86cc727b false none 5003804672 5003804672 5003804672 bdev:///731f0372-0662-43be-aa0c-a51e86cc727b?uuid=731f0372-0662-43be-aa0c-a51e86cc727b false false 0 0
The replicas are back again. Do I deletion the wrong way?
Still failing
io-engine-client replica create --size 2724835328 new 1107276f-ce8e-4dfd-b2aa-feeaaed78234 pool-adm-cp0
Error: GrpcStatus { source: Status { code: InvalidArgument, message: "errno: failed to create lvol new", metadata: MetadataMap { headers: {"content-type": "application/grpc", "date": "Fri, 23 Aug 2024 10:21:47 GMT", "content-length": "0"} }, source: None }, backtrace: Backtrace(()) }
I changed all the volumes replica count to 2 I unlabeled the io-engine from node adm-cp0 -> the io-engine termineted on the node I zeroed out the diskpool
cat <<EOF | kubectl apply -f - apiVersion: v1 kind: Pod metadata: name: disk-wipe spec: restartPolicy: Never nodeName: adm-cp0 containers: - name: disk-wipe image: busybox securityContext: privileged: true command: ["/bin/sh", "-c", "dd if=/dev/zero bs=1M count=100 oflag=direct of=/dev/sda"] EOF
relabled the cp0 However the diskpool exists already
kubectl mayastor get pools ID DISKS MANAGED NODE STATUS CAPACITY ALLOCATED AVAILABLE COMMITTED pool-adm-cp2 aio:///dev/sda?uuid=45e0b46f-c572-4359-bacf-b56def10d9a1 true adm-cp2 Online 476.5GiB 165GiB 311.5GiB 165GiB pool-adm-cp0 aio:///dev/sda?uuid=c86bd997-5a31-4861-857c-cf7f98e6a728 true adm-cp0 Online 476.5GiB 176.5GiB 300GiB 136.5GiB pool-adm-cp1 aio:///dev/sda?uuid=8f0217ae-9335-4169-88a1-48afa7ed3b42 true adm-cp1 Online 476.5GiB 129GiB 347.5GiB 129GiB
I think your dd command did not work somehow, otherwise the pool should not come up as Online when you relabel your io-engine node.
@innotecsol did you manage to resolve this?
Yes, it seems it is working now. I had an issue with the harddrive, which I eventually replaced. After that I could scale the volumes up to 3 again and also enable partial rebuild performing
helm repo update
helm upgrade mayastor mayastor/mayastor -n mayastor --reuse-values --version 2.7.0 --set agents.core.rebuild.partial.enabled=true
It seems to work ok now.
Thanks for your support!!!
Describe the bug During the upgrade of mayastor from 2.5.0 to 2.7.0 the following error is displayed in the log of mayastor-operator-diskpool-5cd48746c-46zwb
To Reproduce Steps to reproduce the behavior: kubectl mayastor upgrade --skip-single-replica-volume-validation
Expected behavior A clear and concise description of what you expected to happen.
Screenshots If applicable, add screenshots to help explain your problem.
OS info (please complete the following information):
talos
version 1.6.7How can I fix the CRD?