openebs-archive / cstor-operators

Collection of OpenEBS cStor Data Engine Operators
https://openebs.io
Apache License 2.0
94 stars 69 forks source link

Unable to mount volume 'SyncFailed' failed to sync CVR error: unable to update snapshot list #238

Closed mazocode closed 3 years ago

mazocode commented 3 years ago

What happened

Trying to mount a cstor volume from a CStorPoolCluster with 3 zvol blockdevices fails with:

Events:
  Type     Reason       Age                 From     Message
  ----     ------       ----                ----     -------
  Warning  FailedMount  26m (x3 over 53m)   kubelet  Unable to attach or mount volumes: unmounted volumes=[dbench-pv], unattached volumes=[default-token-f6f4b dbench-pv]: timed out waiting for the condition
  Warning  FailedMount  16m (x16 over 60m)  kubelet  Unable to attach or mount volumes: unmounted volumes=[dbench-pv], unattached volumes=[dbench-pv default-token-f6f4b]: timed out waiting for the condition
  Warning  FailedMount  73s (x38 over 62m)  kubelet  MountVolume.MountDevice failed for volume "pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485" : kubernetes.io/csi: attacher.MountDevice failed to create newCsiDriverClient: driver name cstor.csi.openebs.io not found in the list of registered CSI drivers

Not sure about the "cstor.csi.openebs.io not found" but this is what I get:

$ kubectl get csidriver  cstor.csi.openebs.io
NAME                   ATTACHREQUIRED   PODINFOONMOUNT   MODES                  AGE
cstor.csi.openebs.io   false            true             Persistent,Ephemeral   47h

and I guess this is the registration looking good to me:

$ kubectl -n openebs logs pod/openebs-cstor-csi-node-6nvt5 csi-node-driver-registrar
I0127 12:08:22.461578       1 main.go:113] Version: v2.1.0
I0127 12:08:22.463064       1 main.go:137] Attempting to open a gRPC connection with: "/plugin/csi.sock"
I0127 12:08:22.463099       1 connection.go:153] Connecting to unix:///plugin/csi.sock
I0127 12:08:23.465723       1 main.go:144] Calling CSI driver to discover driver name
I0127 12:08:23.465756       1 connection.go:182] GRPC call: /csi.v1.Identity/GetPluginInfo
I0127 12:08:23.465767       1 connection.go:183] GRPC request: {}
I0127 12:08:23.469124       1 connection.go:185] GRPC response: {"name":"cstor.csi.openebs.io","vendor_version":"2.5.0"}
I0127 12:08:23.469204       1 connection.go:186] GRPC error: <nil>
I0127 12:08:23.469221       1 main.go:154] CSI driver name: "cstor.csi.openebs.io"
I0127 12:08:23.469263       1 node_register.go:52] Starting Registration Server at: /registration/cstor.csi.openebs.io-reg.sock
I0127 12:08:23.470242       1 node_register.go:61] Registration Server started at: /registration/cstor.csi.openebs.io-reg.sock
I0127 12:08:23.470478       1 node_register.go:83] Skipping healthz server because HTTP endpoint is set to: ""
I0127 12:08:23.506240       1 main.go:80] Received GetInfo call: &InfoRequest{}
I0127 12:08:23.610771       1 main.go:90] Received NotifyRegistrationStatus call: &RegistrationStatus{PluginRegistered:true,Error:,}
$ kubectl -n openebs logs pod/openebs-cstor-csi-controller-0 csi-attacher
I0127 12:08:55.366696       1 main.go:93] Version: v3.1.0-0-g3a0c5a0e
I0127 12:08:55.371691       1 connection.go:153] Connecting to unix:///var/lib/csi/sockets/pluginproxy/csi.sock
I0127 12:08:56.372994       1 common.go:111] Probing CSI driver for readiness
I0127 12:08:56.373046       1 connection.go:182] GRPC call: /csi.v1.Identity/Probe
I0127 12:08:56.373054       1 connection.go:183] GRPC request: {}
I0127 12:08:56.378408       1 connection.go:185] GRPC response: {}
I0127 12:08:56.378479       1 connection.go:186] GRPC error: <nil>
I0127 12:08:56.378502       1 connection.go:182] GRPC call: /csi.v1.Identity/GetPluginInfo
I0127 12:08:56.378508       1 connection.go:183] GRPC request: {}
I0127 12:08:56.379007       1 connection.go:185] GRPC response: {"name":"cstor.csi.openebs.io","vendor_version":"2.5.0"}
I0127 12:08:56.379079       1 connection.go:186] GRPC error: <nil>
I0127 12:08:56.379089       1 main.go:147] CSI driver name: "cstor.csi.openebs.io"
I0127 12:08:56.379096       1 connection.go:182] GRPC call: /csi.v1.Identity/GetPluginCapabilities
I0127 12:08:56.379101       1 connection.go:183] GRPC request: {}
I0127 12:08:56.380098       1 connection.go:185] GRPC response: {"capabilities":[{"Type":{"Service":{"type":1}}},{"Type":{"Service":{"type":2}}}]}
I0127 12:08:56.380292       1 connection.go:186] GRPC error: <nil>
I0127 12:08:56.380304       1 connection.go:182] GRPC call: /csi.v1.Controller/ControllerGetCapabilities
I0127 12:08:56.380309       1 connection.go:183] GRPC request: {}
I0127 12:08:56.380965       1 connection.go:185] GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":9}}},{"Type":{"Rpc":{"type":5}}},{"Type":{"Rpc":{"type":7}}}]}
I0127 12:08:56.381118       1 connection.go:186] GRPC error: <nil>
I0127 12:08:56.381132       1 main.go:188] CSI driver does not support ControllerPublishUnpublish, using trivial handler
I0127 12:08:56.381139       1 connection.go:182] GRPC call: /csi.v1.Controller/ControllerGetCapabilities
I0127 12:08:56.381143       1 connection.go:183] GRPC request: {}
I0127 12:08:56.381591       1 connection.go:185] GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":9}}},{"Type":{"Rpc":{"type":5}}},{"Type":{"Rpc":{"type":7}}}]}
I0127 12:08:56.381714       1 connection.go:186] GRPC error: <nil>
I0127 12:08:56.382041       1 controller.go:121] Starting CSI attacher
I0127 12:08:56.382342       1 reflector.go:219] Starting reflector *v1.VolumeAttachment (10m0s) from k8s.io/client-go/informers/factory.go:134
I0127 12:08:56.382342       1 reflector.go:219] Starting reflector *v1.PersistentVolume (10m0s) from k8s.io/client-go/informers/factory.go:134
I0127 12:08:56.382405       1 reflector.go:255] Listing and watching *v1.PersistentVolume from k8s.io/client-go/informers/factory.go:134
I0127 12:08:56.382385       1 reflector.go:255] Listing and watching *v1.VolumeAttachment from k8s.io/client-go/informers/factory.go:134
I0127 12:08:56.482264       1 shared_informer.go:270] caches populated

The PVC looks like this:

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: iobench-network-hdd-pv
spec:
  storageClassName: cluster-hdd-x1-gli1
  volumeMode: Filesystem
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 4Gi

and this is my storageclass:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: cluster-hdd-x1-gli1
provisioner: cstor.csi.openebs.io
allowVolumeExpansion: true
parameters:
  cas-type: cstor
  cstorPoolCluster: cstor-hdd-x1-gli1-pool
  replicaCount: "1"

The cspi looks fine to me:

$ kubectl -n openebs describe cspi cstor-hdd-x1-gli1-pool-rszn
Name:         cstor-hdd-x1-gli1-pool-rszn
Namespace:    openebs
Labels:       kubernetes.io/hostname=iuck8s1
              openebs.io/cas-type=cstor
              openebs.io/cstor-pool-cluster=cstor-hdd-x1-gli1-pool
              openebs.io/version=2.5.0
Annotations:  <none>
API Version:  cstor.openebs.io/v1
Kind:         CStorPoolInstance
Metadata:
  Creation Timestamp:  2021-01-27T06:36:18Z
  Finalizers:
    cstorpoolcluster.openebs.io/finalizer
    openebs.io/pool-protection
  Generation:  4
  Managed Fields:
    API Version:  cstor.openebs.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:labels:
          .:
          f:kubernetes.io/hostname:
          f:openebs.io/cas-type:
          f:openebs.io/cstor-pool-cluster:
          f:openebs.io/version:
        f:ownerReferences:
      f:spec:
        .:
        f:hostName:
        f:nodeSelector:
          .:
          f:kubernetes.io/hostname:
        f:poolConfig:
          .:
          f:dataRaidGroupType:
          f:priorityClassName:
          f:roThresholdLimit:
          f:tolerations:
      f:status:
        .:
        f:capacity:
          .:
          f:zfs:
        f:healthyReplicas:
        f:provisionedReplicas:
        f:readOnly:
      f:versionDetails:
        .:
        f:desired:
        f:status:
          .:
          f:current:
          f:dependentsUpgraded:
          f:lastUpdateTime:
    Manager:      cspc-operator
    Operation:    Update
    Time:         2021-01-27T06:36:18Z
    API Version:  cstor.openebs.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:finalizers:
      f:spec:
        f:dataRaidGroups:
      f:status:
        f:capacity:
          f:free:
          f:total:
          f:used:
          f:zfs:
            f:logicalUsed:
        f:phase:
    Manager:    pool-manager
    Operation:  Update
    Time:       2021-01-27T06:36:24Z
  Owner References:
    API Version:           cstor.openebs.io/v1
    Block Owner Deletion:  true
    Controller:            true
    Kind:                  CStorPoolCluster
    Name:                  cstor-hdd-x1-gli1-pool
    UID:                   e92ec43c-70e6-4a6a-b5aa-8f657395f844
  Resource Version:        12149729
  UID:                     842a6004-05f2-4b1b-b486-1bc9a49cd4c7
Spec:
  Data Raid Groups:
    Block Devices:
      Block Device Name:  blockdevice-1aaceed9f7fff8072ff265dcefeb7b42
      Dev Link:           /dev/zd32
      Block Device Name:  blockdevice-c136a064f23bd2e7fc4fd171ac8fdc75
      Dev Link:           /dev/zd16
      Block Device Name:  blockdevice-d72ec61c59779b64eabaadb2adef6b7a
      Dev Link:           /dev/zd0
  Host Name:              iuck8s1
  Node Selector:
    kubernetes.io/hostname:  iuck8s1
  Pool Config:
    Data Raid Group Type:  stripe
    Priority Class Name:
    Ro Threshold Limit:    85
    Tolerations:
      Effect:    NoSchedule
      Key:       ik8/role
      Operator:  Equal
      Value:     storage
Status:
  Capacity:
    Free:   5770G
    Total:  5770000612k
    Used:   612k
    Zfs:
      Logical Used:      204k
  Healthy Replicas:      0
  Phase:                 ONLINE
  Provisioned Replicas:  0
  Read Only:             false
Version Details:
  Desired:  2.5.0
  Status:
    Current:              2.5.0
    Dependents Upgraded:  true
    Last Update Time:     <nil>
Events:
  Type    Reason   Age   From               Message
  ----    ------   ----  ----               -------
  Normal  Created  47s   CStorPoolInstance  Pool created successfully

This is the output from the cstor-pool-mgmt at the time I've created the pool:

$ kubectl -n openebs logs cstor-hdd-x1-gli1-pool-rszn-67b66fd55b-7bb54 cstor-pool-mgmt
+ rm /usr/local/bin/zrepl
+ pool_manager_pid=9
+ /usr/local/bin/pool-manager start
+ trap _sigint INT
+ trap _sigterm SIGTERM
+ wait 9
E0127 06:36:20.648096       9 pool.go:122] zpool status returned error in zrepl startup : exit status 1
I0127 06:36:20.648220       9 pool.go:123] Waiting for pool container to start...
I0127 06:36:20.654741       9 controller.go:109] Setting up event handlers for CSPI
I0127 06:36:20.654965       9 controller.go:115] will set up informer event handlers for cvr
I0127 06:36:20.655124       9 new_restore_controller.go:105] Setting up event handlers for restore
I0127 06:36:20.667539       9 controller.go:110] Setting up event handlers for backup
I0127 06:36:20.671985       9 runner.go:38] Starting CStorPoolInstance controller
I0127 06:36:20.672006       9 runner.go:41] Waiting for informer caches to sync
I0127 06:36:20.676276       9 common.go:262] CStorPool found: [cannot open 'name': no such pool ]
I0127 06:36:20.676319       9 run_restore_controller.go:38] Starting CStorRestore controller
I0127 06:36:20.676326       9 run_restore_controller.go:41] Waiting for informer caches to sync
I0127 06:36:20.676334       9 run_restore_controller.go:53] Started CStorRestore workers
I0127 06:36:20.676348       9 runner.go:39] Starting CStorVolumeReplica controller
I0127 06:36:20.676353       9 runner.go:42] Waiting for informer caches to sync
I0127 06:36:20.676359       9 runner.go:47] Starting CStorVolumeReplica workers
I0127 06:36:20.676365       9 runner.go:54] Started CStorVolumeReplica workers
I0127 06:36:20.676381       9 runner.go:38] Starting CStorBackup controller
I0127 06:36:20.676397       9 runner.go:41] Waiting for informer caches to sync
I0127 06:36:20.676414       9 runner.go:53] Started CStorBackup workers
I0127 06:36:20.772140       9 runner.go:45] Starting CStorPoolInstance workers
I0127 06:36:20.772257       9 runner.go:51] Started CStorPoolInstance workers
I0127 06:36:20.790975       9 handler.go:441] Added Finalizer: cstor-hdd-x1-gli1-pool-rszn, 842a6004-05f2-4b1b-b486-1bc9a49cd4c7
I0127 06:36:20.800769       9 import.go:73] Importing pool 842a6004-05f2-4b1b-b486-1bc9a49cd4c7 cstor-e92ec43c-70e6-4a6a-b5aa-8f657395f844
E0127 06:36:20.804784       9 import.go:94] Failed to import pool by reading cache file: failed to open cache file: No such file or directory
cannot import 'cstor-e92ec43c-70e6-4a6a-b5aa-8f657395f844': no such pool available
 : exit status 1
E0127 06:36:21.305102       9 import.go:114] Failed to import pool by scanning directory: 2021-01-27/06:36:20.809 Iterating over all the devices to find zfs devices using blkid
2021-01-27/06:36:21.185 Iterated over cache devices to find zfs devices
2021-01-27/06:36:21.186 Verifying pool existence on the device /dev/sdb1
2021-01-27/06:36:21.249 Verified the device /dev/sdb1 for pool existence
2021-01-27/06:36:21.302 Verifying pool existence on the device /dev/sdb1
2021-01-27/06:36:21.303 Verified the device /dev/sdb1 for pool existence
cannot import 'cstor-e92ec43c-70e6-4a6a-b5aa-8f657395f844': no such pool available
 : exit status 1
I0127 06:36:21.462725       9 create.go:41] Creating a pool for cstor-hdd-x1-gli1-pool-rszn cstor-e92ec43c-70e6-4a6a-b5aa-8f657395f844
I0127 06:36:21.868435       9 event.go:281] Event(v1.ObjectReference{Kind:"CStorPoolInstance", Namespace:"openebs", Name:"cstor-hdd-x1-gli1-pool-rszn", UID:"842a6004-05f2-4b1b-b486-1bc9a49cd4c7", APIVersion:"cstor.openebs.io/v1", ResourceVersion:"12149516", FieldPath:""}): type: 'Normal' reason: 'Created' Pool created successfully

Here is the log from cstor-pool

$ kubectl -n openebs logs cstor-hdd-x1-gli1-pool-rszn-67b66fd55b-7bb54 cstor-pool
Disabling dumping core
sleeping for 2 sec
2021-01-27/06:36:19.808 disabled auto import (reading of zpool.cache)
physmem = 18471389 pages (70.46 GB)
Disk /dev/zd16 does not support synchronize cache SCSI command
Disk /dev/zd0p1 does not support synchronize cache SCSI command
Disk /dev/zd32 does not support synchronize cache SCSI command

And from the maya-exporter

$ kubectl -n openebs logs cstor-hdd-x1-gli1-pool-rszn-67b66fd55b-7bb54 maya-exporter
I0127 06:36:20.054564       1 command.go:118] Starting openebs-exporter ...
I0127 06:36:20.054624       1 command.go:130] Initialising openebs-exporter for the cstor pool
I0127 06:36:20.054797       1 command.go:174] Registered openebs exporter for cstor pool
I0127 06:36:20.054808       1 server.go:41] Starting http server....

Creating the PVC added the following to the logs for cstor-pool:

2021-01-27/06:40:11.947 zvol cstor-e92ec43c-70e6-4a6a-b5aa-8f657395f844/pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485 status change: DEGRADED -> DEGRADED
2021-01-27/06:40:11.947 zvol cstor-e92ec43c-70e6-4a6a-b5aa-8f657395f844/pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485 rebuild status change: INIT -> INIT
2021-01-27/06:40:11.947 Instantiating zvol cstor-e92ec43c-70e6-4a6a-b5aa-8f657395f844/pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485
2021-01-27/06:40:11.966 ERROR err 11 for cstor-e92ec43c-70e6-4a6a-b5aa-8f657395f844/pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485 listsnap
2021-01-27/06:40:12.006 ERROR err 11 for cstor-e92ec43c-70e6-4a6a-b5aa-8f657395f844/pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485 listsnap
2021-01-27/06:40:12.039 ERROR err 11 for cstor-e92ec43c-70e6-4a6a-b5aa-8f657395f844/pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485 listsnap
2021-01-27/06:40:12.075 ERROR err 11 for cstor-e92ec43c-70e6-4a6a-b5aa-8f657395f844/pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485 listsnap
2021-01-27/06:40:19.067 [tgt 10.90.19.191:6060:21]: Connected
2021-01-27/06:40:19.067 [tgt 10.90.19.191:6060:21]: Handshake command for zvol pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485
2021-01-27/06:40:19.069 Volume:cstor-e92ec43c-70e6-4a6a-b5aa-8f657395f844/pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485 has zvol_guid:9728312340268247483
2021-01-27/06:40:19.069 IO sequence number:0 Degraded IO sequence number:0
2021-01-27/06:40:19.070 New data connection on fd 8
2021-01-27/06:40:19.071 ERROR fail on unavailable snapshot pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485@rebuild_snap
2021-01-27/06:40:19.071 Quorum is on, and rep factor 1
2021-01-27/06:40:19.071 zvol cstor-e92ec43c-70e6-4a6a-b5aa-8f657395f844/pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485 rebuild status change: INIT -> DONE
2021-01-27/06:40:19.071 zvol cstor-e92ec43c-70e6-4a6a-b5aa-8f657395f844/pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485 status change: DEGRADED -> HEALTHY
2021-01-27/06:40:19.071 Data connection associated with zvol cstor-e92ec43c-70e6-4a6a-b5aa-8f657395f844/pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485 fd: 8
2021-01-27/06:40:19.071 Started ack sender for zvol cstor-e92ec43c-70e6-4a6a-b5aa-8f657395f844/pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485 fd: 8
2021-01-27/06:40:30.098 [tgt 10.90.19.191:6060:21]: Replica status command for zvol pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485

and cstor-pool-mgmt:

I0127 06:40:11.872702       9 handler.go:571] cVR empty status: 769557cc-1e88-4b0b-b65f-be973e39d928
I0127 06:40:11.872801       9 event.go:281] Event(v1.ObjectReference{Kind:"CStorVolumeReplica", Namespace:"openebs", Name:"pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485-cstor-hdd-x1-gli1-pool-rszn", UID:"769557cc-1e88-4b0b-b65f-be973e39d928", APIVersion:"cstor.openebs.io/v1", ResourceVersion:"12151153", FieldPath:""}): type: 'Normal' reason: 'Synced' Received Resource create event
I0127 06:40:11.917723       9 handler.go:225] will process add event for cvr {pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485-cstor-hdd-x1-gli1-pool-rszn} as volume {cstor-e92ec43c-70e6-4a6a-b5aa-8f657395f844/pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485}
I0127 06:40:11.923416       9 handler.go:574] cVR 'pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485-cstor-hdd-x1-gli1-pool-rszn': uid '769557cc-1e88-4b0b-b65f-be973e39d928': phase 'Init': is_empty_status: false
I0127 06:40:11.923449       9 handler.go:586] cVR pending: 769557cc-1e88-4b0b-b65f-be973e39d928
2021-01-27T06:40:11.950Z        INFO    volumereplica/volumereplica.go:307              {"eventcode": "cstor.volume.replica.create.success", "msg": "Successfully created CStor volume replica", "rname": "cstor-e92ec43c-70e6-4a6a-b5aa-8f657395f844/pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485"}
I0127 06:40:11.950570       9 handler.go:468] cVR creation successful: pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485-cstor-hdd-x1-gli1-pool-rszn, 769557cc-1e88-4b0b-b65f-be973e39d928
I0127 06:40:11.950670       9 event.go:281] Event(v1.ObjectReference{Kind:"CStorVolumeReplica", Namespace:"openebs", Name:"pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485-cstor-hdd-x1-gli1-pool-rszn", UID:"769557cc-1e88-4b0b-b65f-be973e39d928", APIVersion:"cstor.openebs.io/v1", ResourceVersion:"12151159", FieldPath:""}): type: 'Normal' reason: 'Created' Resource created successfully
I0127 06:40:11.967145       9 event.go:281] Event(v1.ObjectReference{Kind:"CStorVolumeReplica", Namespace:"openebs", Name:"pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485-cstor-hdd-x1-gli1-pool-rszn", UID:"769557cc-1e88-4b0b-b65f-be973e39d928", APIVersion:"cstor.openebs.io/v1", ResourceVersion:"12151159", FieldPath:""}): type: 'Warning' reason: 'SyncFailed' failed to sync CVR error: unable to update snapshot list details in CVR: failed to get the list of snapshots: Output: failed listsnap command for cstor-e92ec43c-70e6-4a6a-b5aa-8f657395f844/pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485 with err 11
 Error: exit status 11
I0127 06:40:12.006655       9 event.go:281] Event(v1.ObjectReference{Kind:"CStorVolumeReplica", Namespace:"openebs", Name:"pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485-cstor-hdd-x1-gli1-pool-rszn", UID:"769557cc-1e88-4b0b-b65f-be973e39d928", APIVersion:"cstor.openebs.io/v1", ResourceVersion:"12151168", FieldPath:""}): type: 'Warning' reason: 'SyncFailed' failed to sync CVR error: unable to update snapshot list details in CVR: failed to get the list of snapshots: Output: failed listsnap command for cstor-e92ec43c-70e6-4a6a-b5aa-8f657395f844/pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485 with err 11
 Error: exit status 11
I0127 06:40:12.039959       9 event.go:281] Event(v1.ObjectReference{Kind:"CStorVolumeReplica", Namespace:"openebs", Name:"pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485-cstor-hdd-x1-gli1-pool-rszn", UID:"769557cc-1e88-4b0b-b65f-be973e39d928", APIVersion:"cstor.openebs.io/v1", ResourceVersion:"12151172", FieldPath:""}): type: 'Warning' reason: 'SyncFailed' failed to sync CVR error: unable to update snapshot list details in CVR: failed to get the list of snapshots: Output: failed listsnap command for cstor-e92ec43c-70e6-4a6a-b5aa-8f657395f844/pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485 with err 11
 Error: exit status 11
I0127 06:40:12.076037       9 event.go:281] Event(v1.ObjectReference{Kind:"CStorVolumeReplica", Namespace:"openebs", Name:"pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485-cstor-hdd-x1-gli1-pool-rszn", UID:"769557cc-1e88-4b0b-b65f-be973e39d928", APIVersion:"cstor.openebs.io/v1", ResourceVersion:"12151173", FieldPath:""}): type: 'Warning' reason: 'SyncFailed' failed to sync CVR error: unable to update snapshot list details in CVR: failed to get the list of snapshots: Output: failed listsnap command for cstor-e92ec43c-70e6-4a6a-b5aa-8f657395f844/pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485 with err 11
 Error: exit status 11

Looking at the CVR did not reveal any more useful things to me:

$ kubectl -n openebs describe cvr pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485-cstor-hdd-x1-gli1-pool-rszn
Name:         pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485-cstor-hdd-x1-gli1-pool-rszn
Namespace:    openebs
Labels:       cstorpoolinstance.openebs.io/name=cstor-hdd-x1-gli1-pool-rszn
              cstorpoolinstance.openebs.io/uid=842a6004-05f2-4b1b-b486-1bc9a49cd4c7
              cstorvolume.openebs.io/name=pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485
              openebs.io/persistent-volume=pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485
              openebs.io/version=2.5.0
Annotations:  cstorpoolinstance.openebs.io/hostname: iuck8s1
API Version:  cstor.openebs.io/v1
Kind:         CStorVolumeReplica
Metadata:
  Creation Timestamp:  2021-01-27T06:40:14Z
  Finalizers:
    cstorvolumereplica.openebs.io/finalizer
  Generation:  120
  Managed Fields:
    API Version:  cstor.openebs.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:cstorpoolinstance.openebs.io/hostname:
        f:finalizers:
          .:
          v:"cstorvolumereplica.openebs.io/finalizer":
        f:labels:
          .:
          f:cstorpoolinstance.openebs.io/name:
          f:cstorpoolinstance.openebs.io/uid:
          f:cstorvolume.openebs.io/name:
          f:openebs.io/persistent-volume:
          f:openebs.io/version:
        f:ownerReferences:
          .:
          k:{"uid":"4a420048-edad-4aa3-8b3a-0201392695ab"}:
            .:
            f:apiVersion:
            f:blockOwnerDeletion:
            f:controller:
            f:kind:
            f:name:
            f:uid:
      f:spec:
        .:
        f:targetIP:
      f:status:
        .:
        f:capacity:
          .:
          f:total:
          f:used:
      f:versionDetails:
        .:
        f:desired:
        f:status:
          .:
          f:current:
          f:dependentsUpgraded:
          f:lastUpdateTime:
    Manager:      cvc-operator
    Operation:    Update
    Time:         2021-01-27T06:40:14Z
    API Version:  cstor.openebs.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:spec:
        f:capacity:
        f:replicaid:
      f:status:
        f:capacity:
          f:total:
          f:used:
        f:lastTransitionTime:
        f:lastUpdateTime:
        f:phase:
    Manager:    pool-manager
    Operation:  Update
    Time:       2021-01-27T06:40:14Z
  Owner References:
    API Version:           cstor.openebs.io/v1
    Block Owner Deletion:  true
    Controller:            true
    Kind:                  CStorVolume
    Name:                  pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485
    UID:                   4a420048-edad-4aa3-8b3a-0201392695ab
  Resource Version:        12175494
  UID:                     769557cc-1e88-4b0b-b65f-be973e39d928
Spec:
  Capacity:   4G
  Replicaid:  078C69A9AC378F4DFE74A3BC56DD42F9
  Target IP:  10.90.19.191
Status:
  Capacity:
    Total:               6K
    Used:                6K
  Last Transition Time:  2021-01-27T06:40:20Z
  Last Update Time:      2021-01-27T07:37:50Z
  Phase:                 Healthy
Version Details:
  Desired:  2.5.0
  Status:
    Current:              2.5.0
    Dependents Upgraded:  true
    Last Update Time:     <nil>
Events:
  Type     Reason      Age                From                Message
  ----     ------      ----               ----                -------
  Normal   Synced      57m                CStorVolumeReplica  Received Resource create event
  Normal   Created     57m                CStorVolumeReplica  Resource created successfully
  Warning  SyncFailed  57m (x4 over 57m)  CStorVolumeReplica  failed to sync CVR error: unable to update snapshot list details in CVR: failed to get the list of snapshots: Output: failed listsnap command for cstor-e92ec43c-70e6-4a6a-b5aa-8f657395f844/pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485 with err 11
 Error: exit status 11

The iSCSI exports seems to be fine:

$ sudo iscsiadm --mode discovery --type sendtargets --portal 10.90.19.191
10.90.19.191:3260,1 iqn.2016-09.com.openebs.cstor:pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485
$ sudo iscsiadm -d2 -m node -T iqn.2016-09.com.openebs.cstor:pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485 -p 10.90.19.191 --login
iscsiadm: Max file limits 1024 262144
iscsiadm: default: Creating session 1/1
Logging in to [iface: default, target: iqn.2016-09.com.openebs.cstor:pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485, portal: 10.90.19.191,3260]
Login to [iface: default, target: iqn.2016-09.com.openebs.cstor:pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485, portal: 10.90.19.191,3260] successful.

But the device seems to have no partition: $ sudo fdisk /dev/sda

Welcome to fdisk (util-linux 2.32.1).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.

Device does not contain a recognized partition table.
Created a new DOS disklabel with disk identifier 0xe845e9e2.

Command (m for help): p
Disk /dev/sda: 4 GiB, 4294967296 bytes, 8388608 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 32768 bytes / 1048576 bytes
Disklabel type: dos
Disk identifier: 0xe845e9e2
mittachaitu commented 3 years ago

failed to sync CVR error: unable to update snapshot list details in CVR: failed to get the list of snapshots: Output: failed listsnap command for cstor-e92ec43c-70e6-4a6a-b5aa-8f657395f844/pvc-4bf42c56-ada5-4b6d-81ec-7f77a1e7e485 with err 11 Error: exit status 11

Hi @mazocode, the issue on describe of CVR can be ignored since it is an intermittent warning, can happen when the replica is not yet connected to the target. Issue moreover looks like kubelet is not able to identify the CSI driver. Can we get the output of kubectl get csinodes -o yaml? but from csi-node driver logs things looks good:

I0127 12:08:23.469124 1 connection.go:185] GRPC response: {"name":"cstor.csi.openebs.io","vendor_version":"2.5.0"} I0127 12:08:23.469204 1 connection.go:186] GRPC error: I0127 12:08:23.469221 1 main.go:154] CSI driver name: "cstor.csi.openebs.io" I0127 12:08:23.469263 1 node_register.go:52] Starting Registration Server at: /registration/cstor.csi.openebs.io-reg.sock I0127 12:08:23.470242 1 node_register.go:61] Registration Server started at: /registration/cstor.csi.openebs.io-reg.sock I0127 12:08:23.470478 1 node_register.go:83] Skipping healthz server because HTTP endpoint is set to: "" I0127 12:08:23.506240 1 main.go:80] Received GetInfo call: &InfoRequest{} I0127 12:08:23.610771 1 main.go:90] Received NotifyRegistrationStatus call: &RegistrationStatus{PluginRegistered:true,Error:,}

above says registration is succeeded. Can we try same with the latest version of OpenEBS?

mazocode commented 3 years ago

Thanks @mittachaitu. This was a while back and somehow we managed to get it running (don't remember exactly what was going on with this), so I think we can close this.