openebs / openebs

Most popular & widely deployed Open Source Container Native Storage platform for Stateful Persistent Applications on Kubernetes.
https://www.openebs.io
Apache License 2.0
9k stars 945 forks source link

Newly created volume fails to mount with fsck errors #1276

Closed kmova closed 4 years ago

kmova commented 6 years ago

kubectl get pods

NAME                                                             READY     STATUS              RESTARTS   AGE
pvc-93202023-1896-11e8-b8a8-96000007f375-ctrl-7fb796f666-tq7gj   2/2       Running             0          44m
pvc-93202023-1896-11e8-b8a8-96000007f375-rep-6bd59bcb55-49l56    1/1       Running             0          44m
pvc-93202023-1896-11e8-b8a8-96000007f375-rep-6bd59bcb55-gcf74    1/1       Running             0          1h
wordpress-55cbcdd99b-nrh5v                                       0/1       ContainerCreating   0          46m

kubectl describe pod wordpress-55cbcdd99b-nrh5v

Name:           wordpress-55cbcdd99b-nrh5v
Namespace:      default
<snip>
Volumes:
  wordpress-persistent-storage:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  wp-pv-claim
    ReadOnly:   false
<snip>
Events:
  Type     Reason                 Age                From                                   Message
  ----     ------                 ----               ----                                   -------
  Normal   Scheduled              46m                default-scheduler                      Successfully assigned wordpress-55cbcdd99b-nrh5v to worker-02
  Warning  FailedAttachVolume     46m                attachdetach-controller                Multi-Attach error for volume "pvc-93202023-1896-11e8-b8a8-96000007f375" Volume is already exclusively attached to one node and can't be attached to another
  Normal   SuccessfulMountVolume  46m                kubelet, worker-02  MountVolume.SetUp succeeded for volume "default-token-ppvzm"
  Warning  FailedMount            1m (x20 over 44m)  kubelet, worker-02  Unable to mount volumes for pod "wordpress-55cbcdd99b-nrh5v_default(8b9c9987-1899-11e8-b8a8-96000007f375)": timeout expired waiting for volumes to attach/mount for pod "default"/"wordpress-55cbcdd99b-nrh5v". list of unattached/unmounted volumes=[wordpress-persistent-storage]

kubectl get svc

NAME                                                TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                                                         AGE
maya-apiserver-service                              ClusterIP   10.106.189.8     <none>        5656/TCP                                                        4d
pvc-93202023-1896-11e8-b8a8-96000007f375-ctrl-svc   ClusterIP   10.102.156.74    <none>        3260/TCP,9501/TCP                                               5h

Used the curl commands to query for the volume status using the cluster-ip 10.102.156.74, which showed that volume - controller and replica's were functional. (Refer #1275)

After seeing that volume was online, checked the openebs volume controller logs:

kubectl logs pvc-93202023-1896-11e8-b8a8-96000007f375-ctrl-7fb796f666-tq7gj Same can be seen from the kubectl cluster-info dump

==== START logs for container pvc-93202023-1896-11e8-b8a8-96000007f375-ctrl-con of pod default/pvc-93202023-1896-11e8-b8a8-96000007f375-ctrl-7fb796f666-tq7gj ====
<snip... adding first replica and setting to RW>
time="2018-02-23T13:02:33Z" level=info msg="Connecting to remote: 10.244.1.10:9502" 
time="2018-02-23T13:02:33Z" level=info msg="Opening: 10.244.1.10:9502" 
time="2018-02-23T13:02:33Z" level=info msg="Adding backend: tcp://10.244.1.10:9502" 
time="2018-02-23T13:02:33Z" level=info msg="Set replica tcp://10.244.1.10:9502 to mode RW" 

<snip... adding second replica and setting to RW>
time="2018-02-23T13:02:37Z" level=info msg="Connecting to remote: 10.244.3.8:9502" 
time="2018-02-23T13:02:37Z" level=info msg="Opening: 10.244.3.8:9502" 
time="2018-02-23T13:02:37Z" level=info msg="Adding backend: tcp://10.244.3.8:9502" 
time="2018-02-23T13:02:38Z" level=info msg="Set replica tcp://10.244.3.8:9502 to mode RW" 
time="2018-02-23T13:02:38Z" level=info msg="Update peer details of 10.244.1.10:9502 " 
time="2018-02-23T13:02:38Z" level=info msg="Update peer details of 10.244.3.8:9502 " 

<snip.. accepting connection from initiator>
time="2018-02-23T13:05:13Z" level=info msg="10.244.1.9:3260" 
time="2018-02-23T13:05:13Z" level=info msg="Accepting ..." 
time="2018-02-23T13:05:13Z" level=info msg="connection is connected from 10.244.2.0:50324...\n" 
time="2018-02-23T13:05:13Z" level=info msg="Listening ..." 
time="2018-02-23T13:05:13Z" level=info msg="New Session initiator name:iqn.1993-08.org.debian:01:b93aa358deea,target name:iqn.2016-09.com.openebs.jiva:pvc-93202023-1896-11e8-b8a8-96000007f375,ISID:0x23d010000" 
time="2018-02-23T13:05:19Z" level=error msg=EOF 
time="2018-02-23T13:06:50Z" level=info msg="10.244.1.9:3260" 
time="2018-02-23T13:06:50Z" level=info msg="Accepting ..." 
time="2018-02-23T13:06:50Z" level=info msg="connection is connected from 10.244.3.0:40700...\n" 
time="2018-02-23T13:06:50Z" level=info msg="Listening ..." 
time="2018-02-23T13:06:50Z" level=warning msg="unexpected connection state: full feature" 
time="2018-02-23T13:06:50Z" level=error msg=EOF 
time="2018-02-23T13:06:50Z" level=info msg="10.244.1.9:3260" 
time="2018-02-23T13:06:50Z" level=info msg="Accepting ..." 
time="2018-02-23T13:06:50Z" level=info msg="connection is connected from 10.244.3.0:40702...\n" 
time="2018-02-23T13:06:50Z" level=info msg="Listening ..." 
time="2018-02-23T13:06:51Z" level=info msg="New Session initiator name:iqn.1993-08.org.debian:01:b93aa358deea,target name:iqn.2016-09.com.openebs.jiva:pvc-93202023-1896-11e8-b8a8-96000007f375,ISID:0x23d020000"
time="2018-02-23T13:06:51Z" level=error msg="non support" 
time="2018-02-23T13:06:51Z" level=warning msg="check condition" 
time="2018-02-23T13:06:52Z" level=warning msg="check condition" 
time="2018-02-23T13:06:52Z" level=warning msg="check condition" 

==== END logs for container pvc-93202023-1896-11e8-b8a8-96000007f375-ctrl-con of pod

The above openebs controller logs shows that connection was made from IQN - New Session initiator name:iqn.1993-08.org.debian:01:b93aa358deea. Started to check from "worker-02" where the volume should be mounted.

kmova commented 6 years ago

Logged into the worker-02 node via ssh.

cat /etc/iscsi/initiatorname.iscsi

## DO NOT EDIT OR REMOVE THIS FILE!
## If you remove this file, the iSCSI daemon will not start.
## If you change the InitiatorName, existing access control lists
## may reject this initiator.  The InitiatorName must be unique
## for each iSCSI initiator.  Do NOT duplicate iSCSI InitiatorNames.
InitiatorName=iqn.1993-08.org.debian:01:b93aa358deea

However, the above doesn't always help, since most of the machines come with same IQN by default

Looked at the kubelet and system (kernel/iscsid) logs (in this case, they were all in syslog - since kubelet was running as service on the host). The following entries show the connection establishment and failure to mount.

Feb 23 14:06:50 worker-02 kubelet[6072]: I0223 14:06:50.637206    6072 operation_generator.go:1111] Controller attach succeeded for volume "pvc-9320
2023-1896-11e8-b8a8-96000007f375" (UniqueName: "kubernetes.io/iscsi/10.102.156.74:3260:iqn.2016-09.com.openebs.jiva:pvc-93202023-1896-11e8-b8a8-96000007f375:0") pod "w
ordpress-55cbcdd99b-nrh5v" (UID: "8b9c9987-1899-11e8-b8a8-96000007f375") device path: ""

Feb 23 14:06:50 worker-02 kubelet[6072]: I0223 14:06:50.736320    6072 operation_generator.go:446] MountVolume.WaitForAttach entering for volume "pv
c-93202023-1896-11e8-b8a8-96000007f375" (UniqueName: "kubernetes.io/iscsi/10.102.156.74:3260:iqn.2016-09.com.openebs.jiva:pvc-93202023-1896-11e8-b8a8-96000007f375:0")
pod "wordpress-55cbcdd99b-nrh5v" (UID: "8b9c9987-1899-11e8-b8a8-96000007f375") DevicePath ""

Feb 23 14:06:50 worker-02 kubelet[6072]: E0223 14:06:50.744831    6072 iscsi_util.go:235] iscsi: failed to rescan session with error: iscsiadm: No session found.
Feb 23 14:06:50 worker-02 kubelet[6072]:  (exit status 21)
Feb 23 14:06:51 worker-02 kernel: [349862.236890] scsi host4: iSCSI Initiator over TCP/IP
Feb 23 14:06:51 worker-02 kernel: [349862.755929] scsi 4:0:0:0: Direct-Access     CLOUDBYT OPENEBS          0.2  PQ: 0 ANSI: 5
Feb 23 14:06:51 worker-02 kernel: [349862.758792] sd 4:0:0:0: Attached scsi generic sg2 type 0
Feb 23 14:06:51 worker-02 kernel: [349862.759072] sd 4:0:0:0: [sdb] 4194304 512-byte logical blocks: (2.15 GB/2.00 GiB)
Feb 23 14:06:51 worker-02 kernel: [349862.759075] sd 4:0:0:0: [sdb] 4096-byte physical blocks
Feb 23 14:06:51 worker-02 kernel: [349862.759834] sd 4:0:0:0: [sdb] Write Protect is off
Feb 23 14:06:51 worker-02 kernel: [349862.759836] sd 4:0:0:0: [sdb] Mode Sense: 03 00 10 08
Feb 23 14:06:51 worker-02 kernel: [349862.760109] sd 4:0:0:0: [sdb] No Caching mode page found
Feb 23 14:06:51 worker-02 kernel: [349862.763321] sd 4:0:0:0: [sdb] Assuming drive cache: write through
Feb 23 14:06:51 worker-02 kernel: [349862.883977] sd 4:0:0:0: [sdb] Attached SCSI disk

Feb 23 14:06:51 worker-02 iscsid: Connection2:0 to [target: iqn.2016-09.com.openebs.jiva:pvc-93202023-1896-11e8-b8a8-96000007f375, portal: 10.102.15
6.74,3260] through [iface: default] is operational now

Feb 23 14:06:54 worker-02 kubelet[6072]: E0223 14:06:54.962815    6072 iscsi_util.go:338] iscsi: failed to mount iscsi volume /dev/disk/by-path/ip-1
0.102.156.74:3260-iscsi-iqn.2016-09.com.openebs.jiva:pvc-93202023-1896-11e8-b8a8-96000007f375-lun-0 [ext4] to /var/lib/kubelet/plugins/kubernetes.io/iscsi/iface-defaul
t/10.102.156.74:3260-iqn.2016-09.com.openebs.jiva:pvc-93202023-1896-11e8-b8a8-96000007f375-lun-0, error 'fsck' found errors on device /dev/disk/by-path/ip-10.102.156.7
4:3260-iscsi-iqn.2016-09.com.openebs.jiva:pvc-93202023-1896-11e8-b8a8-96000007f375-lun-0 but could not correct them: fsck from util-linux 2.27.1

Feb 23 14:06:54 worker-02 kubelet[6072]: /dev/sdb: Superblock has an invalid journal (inode 8).
Feb 23 14:06:54 worker-02 kubelet[6072]: CLEARED.
Feb 23 14:06:54 worker-02 kubelet[6072]: *** ext3 journal has been deleted - filesystem is now ext2 only ***
Feb 23 14:06:54 worker-02 kubelet[6072]: /dev/sdb: One or more block group descriptor checksums are invalid.  FIXED.
Feb 23 14:06:54 worker-02 kubelet[6072]: /dev/sdb: Group descriptor 0 checksum is 0x0000, should be 0x9444.
Feb 23 14:06:54 worker-02 kubelet[6072]: /dev/sdb: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
Feb 23 14:06:54 worker-02 kubelet[6072]: #011(i.e., without -a or -p options)
Feb 23 14:06:54 worker-02 kubelet[6072]: .

Feb 23 14:06:54 worker-02 kubelet[6072]: E0223 14:06:54.965441    6072 nestedpendingoperations.go:263] Operation for "\"kubernetes.io/iscsi/10.102.1
56.74:3260:iqn.2016-09.com.openebs.jiva:pvc-93202023-1896-11e8-b8a8-96000007f375:0\"" failed. No retries permitted until 2018-02-23 14:06:55.465364682 +0100 CET m=+349
788.010169186 (durationBeforeRetry 500ms). Error: "MountVolume.WaitForAttach failed for volume \"pvc-93202023-1896-11e8-b8a8-96000007f375\" (UniqueName: \"kubernetes.i
o/iscsi/10.102.156.74:3260:iqn.2016-09.com.openebs.jiva:pvc-93202023-1896-11e8-b8a8-96000007f375:0\") pod \"wordpress-55cbcdd99b-nrh5v\" (UID: \"8b9c9987-1899-11e8-b8a
8-96000007f375\") : 'fsck' found errors on device /dev/disk/by-path/ip-10.102.156.74:3260-iscsi-iqn.2016-09.com.openebs.jiva:pvc-93202023-1896-11e8-b8a8-96000007f375-l
un-0 but could not correct them: fsck from util-linux 2.27.1\n/dev/sdb: Superblock has an invalid journal (inode 8).\nCLEARED.\n*** ext3 journal has been deleted - fil
esystem is now ext2 only ***\n\n/dev/sdb: One or more block group descriptor checksums are invalid.  FIXED.\n/dev/sdb: Group descriptor 0 checksum is 0x0000, should be
 0x9444.  \n\n/dev/sdb: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.\n\t(i.e., without -a or -p options)\n."

Feb 23 14:06:55 worker-02 kernel: [349866.797554] sd 4:0:0:0: lun280922523394096 has a LUN larger than allowed by the host adapter

Feb 23 14:06:55 worker-02 kubelet[6072]: E0223 14:06:55.663460    6072 iscsi_util.go:338] iscsi: failed to mount iscsi volume /dev/disk/by-path/ip-10.102.156.74:3260-iscsi-iqn.2016-09.com.openebs.jiva:pvc-93202023-1896-11e8-b8a8-96000007f375-lun-0 [ext4] to /var/lib/kubelet/plugins/kubernetes.io/iscsi/iface-default/10.102.156.74:3260-iqn.2016-09.com.openebs.jiva:pvc-93202023-1896-11e8-b8a8-96000007f375-lun-0, error 'fsck' found errors on device /dev/disk/by-path/ip-10.102.156.74:3260-iscsi-iqn.2016-09.com.openebs.jiva:pvc-93202023-1896-11e8-b8a8-96000007f375-lun-0 but could not correct them: fsck from util-linux 2.27.1

Feb 23 14:06:55 worker-02 kubelet[6072]: /dev/sdb: One or more block group descriptor checksums are invalid.  FIXED.
Feb 23 14:06:55 worker-02 kubelet[6072]: /dev/sdb: Group descriptor 0 checksum is 0x0000, should be 0x9444.
Feb 23 14:06:55 worker-02 kubelet[6072]: /dev/sdb: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
Feb 23 14:06:55 worker-02 kubelet[6072]: #011(i.e., without -a or -p options)
Feb 23 14:06:55 worker-02 kubelet[6072]: .
kmova commented 6 years ago

The openebs volume controller ( iSCSI Target) - only showed the following messages that were inconclusive if the fsck failed due to target errors.

time="2018-02-23T13:05:13Z" level=info msg="10.244.1.9:3260"
time="2018-02-23T13:05:13Z" level=info msg="Accepting ..."
time="2018-02-23T13:05:13Z" level=info msg="connection is connected from 10.244.2.0:50324...\n"
time="2018-02-23T13:05:13Z" level=info msg="Listening ..."
time="2018-02-23T13:05:13Z" level=info msg="New Session initiator name:iqn.1993-08.org.debian:01:b93aa358deea,target name:iqn.2016-09.com.openebs.jiva:pvc-93202023-1896-11e8-b8a8-96000007f375,ISID:0x23d010000"
time="2018-02-23T13:05:19Z" level=error msg=EOF
time="2018-02-23T13:06:50Z" level=info msg="10.244.1.9:3260"
time="2018-02-23T13:06:50Z" level=info msg="Accepting ..."
time="2018-02-23T13:06:50Z" level=info msg="connection is connected from 10.244.3.0:40700...\n"
time="2018-02-23T13:06:50Z" level=info msg="Listening ..."
time="2018-02-23T13:06:50Z" level=warning msg="unexpected connection state: full feature"
time="2018-02-23T13:06:50Z" level=error msg=EOF
time="2018-02-23T13:06:50Z" level=info msg="10.244.1.9:3260"
time="2018-02-23T13:06:50Z" level=info msg="Accepting ..."
time="2018-02-23T13:06:50Z" level=info msg="connection is connected from 10.244.3.0:40702...\n"
time="2018-02-23T13:06:50Z" level=info msg="Listening ..."
time="2018-02-23T13:06:51Z" level=info msg="New Session initiator name:iqn.1993-08.org.debian:01:b93aa358deea,target name:iqn.2016-09.com.openebs.jiva:pvc-93202023-1896-11e8-b8a8-96000007f375,ISID:0x23d020000"
time="2018-02-23T13:06:51Z" level=error msg="non support"
time="2018-02-23T13:06:51Z" level=warning msg="check condition"
time="2018-02-23T13:06:52Z" level=warning msg="check condition"
time="2018-02-23T13:06:52Z" level=warning msg="check condition"

But since this was a new volume, used the following work around to bring the volume online.

Work Around: To get the volume back online, ran fsck /dev/sdb on the host. Post this the volume became accessible and application started.

PeterGrace commented 6 years ago

I've had this same issue happen for me twice; the first time I figured it was due to disk pressure and using sparse images, but this time around it happened without any disk pressure, and indeed, even without any input from me as far as I can tell. One of the three pods of my elasticsearch cluster decided to die, and when it has come back up, the node can't mount the pvc filesystem.

The fact that this happens when a pod is in an otherwise healthy state is disquieting. Deleting pods is a valid balancing strategy that should not affect the filesystem's consistency, especially if the mounting is handled at the host level.

Fortunately this is a clustered service, so I can delete the pvc, recreate the pod and get back to work, but this would otherwise be a pretty serious problem in production.

One item in the below output that is interesting is that I've asked for the CAS to be provisioned as xfs, but they're being made as ext4 apparently. I'm not sure if that's an issue with rancher or openebs.

pvc:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  annotations:
    control-plane.alpha.kubernetes.io/leader: '{"holderIdentity":"4dd1d524-ca93-11e8-afd9-6a03d5095334","leaseDurationSeconds":15,"acquireTime":"2018-10-08T02:51:44Z","renewTime":"2018-10-08T02:51:46Z","leaderTransitions":0}'
    pv.kubernetes.io/bind-completed: "yes"
    pv.kubernetes.io/bound-by-controller: "yes"
    volume.beta.kubernetes.io/storage-provisioner: openebs.io/provisioner-iscsi
  creationTimestamp: null
  finalizers:
  - kubernetes.io/pvc-protection
  labels:
    app: eskeim
    release: eskeim
  name: eskeim-data-eskeim-0
  selfLink: /api/v1/namespaces/monitoring/persistentvolumeclaims/eskeim-data-eskeim-0
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
  storageClassName: openebs-1repl
  volumeName: monitoring-eskeim-data-eskeim-0-3882014528
status: {}

storage class:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  annotations:
    cas.openebs.io/config: |
      - name: ReplicaCount
        value: "1"
      - name: StoragePool
        value: default
  creationTimestamp: null
  name: openebs-1repl
  selfLink: /apis/storage.k8s.io/v1/storageclasses/openebs-1repl
parameters:
  openebs.io/fstype: xfs
provisioner: openebs.io/provisioner-iscsi
reclaimPolicy: Delete
volumeBindingMode: Immediate

storage pool:

apiVersion: openebs.io/v1alpha1
kind: StoragePool
metadata:
  generation: 1
  labels:
    openebs.io/version: 0.7.0
  name: default
  namespace: ""
  resourceVersion: ""
  selfLink: /apis/openebs.io/v1alpha1/storagepools/default
  uid: ""
spec:
  path: /var/openebs
PeterGrace commented 6 years ago

I am having this issue occurring consistently in my deployments. It seems to happen when the storage fails to honor an eviction request during a kubectl drain scenario. The problem is that whatever is provisioning the disk is using a newer version of ext4 that is not supported by CentOS 7.

I tried jumping into the replica pod to scan the filesystem on /openebs/volume-head-000.img, but the fsck in that container is ALSO too old to recognize metadata_csum.

What provisions the filesystem when the pvc requests it? It has completely ignored my request to use xfs and provisions everything as ext4.

damlub commented 6 years ago

I am facing the same problems. The Pod event states to run fsck manually, but the elasticsearch Pod does not start at all, and I see no other way than entering a running container to run a fsck on that device. Whenever I define a new deployment (e.g., plain busybox) and the PV to be used, it has to be mounted, but the mount fails because of the faulty FS.

kmova commented 6 years ago

@damlub - which OS are you using?

@PeterGrace - The FSType was being ignored in 0.7.0. Fixed it in 0.7.1, the FSType needs to be mentioned under cas.openebs.io/config as shown below:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  annotations:
    cas.openebs.io/config: |
      - name: ReplicaCount
        value: "1"
      - name: StoragePool
        value: default
      - name: FSType
        value: xfs
  creationTimestamp: null
  name: openebs-1repl
  selfLink: /apis/storage.k8s.io/v1/storageclasses/openebs-1repl
provisioner: openebs.io/provisioner-iscsi
reclaimPolicy: Delete
volumeBindingMode: Immediate
damlub commented 6 years ago

@kmova My systems are CentOS 7.5

So far I figured out that the containers of openEBS use a newer version of ext4 - especially the e2fsck - than CentOS. Not sure if this is related. If it would help, I can go with Ubuntu 16.04 or 18.04, too.

kmova commented 6 years ago

Thanks @damlub - Could you try with Ubuntu 16.04, while we check on the CentOS 7.5?

utkarshmani1997 commented 5 years ago

@damlub i deployed openebs on CentOS 7.0 successfully, no issues so far with ext4/xfs. I'm also trying to setup CentOS 7.5 using vagrant but i'm having some issues with it, will keep you posted about the progress.I have used following Vagrantfile to brought up kubernetes cluster.

utkarshmani1997 commented 5 years ago

@damlub i am having issues in bringing up setup (CentOS 7.5) using vagrant or kops. I would like to discuss about the same in details, please join us at openebs-community.slack.com and would you like to provide your slack handle if joined already.

vishnuitta commented 5 years ago
PeterGrace commented 5 years ago

Happened again today.

  Warning  FailedMount             34s (x11 over 6m)  kubelet, k8snode01       MountVolume.MountDevice failed for volume "pvc-42b9c99b-1c36-11e9-a880-060c2990a513" : 'fsck' found errors on device /dev/disk/by-path/ip-10.43.249.129:3260-iscsi-iqn.2016-09.com.openebs.jiva:pvc-42b9c99b-1c36-11e9-a880-060c2990a513-lun-0 but could not correct them: fsck from util-linux 2.29.2
/dev/sde: One or more block group descriptor checksums are invalid.  FIXED.
/dev/sde: Group descriptor 64 checksum is 0x0000, should be 0xce0a.

/dev/sde: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
  (i.e., without -a or -p options)
.
  Warning  FailedMount  25s (x3 over 4m)  kubelet, k8snode01  Unable to mount volumes for pod "eskeim-1_monitoring(49697f69-1c3f-11e9-a880-060c2990a513)": timeout expired waiting for volumes to attach or mount for pod "monitoring"/"eskeim-1". list of unmounted volumes=[eskeim-data]. list of unattached volumes=[eskeim-data default-token-vgt6j]

volume:

$ kubectl describe pv pvc-42b9c99b-1c36-11e9-a880-060c2990a513
Name:            pvc-42b9c99b-1c36-11e9-a880-060c2990a513
Labels:          openebs.io/cas-type=jiva
                 openebs.io/storageclass=jiva-1rep
Annotations:     openEBSProvisionerIdentity=k8snode02
                 openebs.io/cas-type=jiva
                 pv.kubernetes.io/provisioned-by=openebs.io/provisioner-iscsi
Finalizers:      [kubernetes.io/pv-protection]
StorageClass:    jiva-1rep
Status:          Bound
Claim:           monitoring/eskeim-data-eskeim-1
Reclaim Policy:  Delete
Access Modes:    RWO
Capacity:        20Gi
Node Affinity:   <none>
Message:
Source:
    Type:               ISCSI (an ISCSI Disk resource that is attached to a kubelet's host machine and then exposed to the pod)
    TargetPortal:       10.43.249.129:3260
    IQN:                iqn.2016-09.com.openebs.jiva:pvc-42b9c99b-1c36-11e9-a880-060c2990a513
    Lun:                0
    ISCSIInterface      default
    FSType:             ext4
    ReadOnly:           false
    Portals:            []
    DiscoveryCHAPAuth:  false
    SessionCHAPAuth:    false
    SecretRef:          <nil>
    InitiatorName:      <none>
Events:                 <none>

The storageClass is explicitly saying to use xfs:

  apiVersion: storage.k8s.io/v1
  kind: StorageClass
  metadata:
    annotations:
      cas.openebs.io/config: |
        - name: ReplicaCount
          value: "1"
        - name: StoragePool
          value: default
        - name: FStype
          value: xfs
        #- name: TargetResourceLimits
        #  value: |-
        #      memory: 1Gi
        #      cpu: 100m
        #- name: AuxResourceLimits
        #  value: |-
        #      memory: 0.5Gi
        #      cpu: 50m
        #- name: ReplicaResourceLimits
        #  value: |-
        #      memory: 2Gi
      openebs.io/cas-type: jiva
      openebs.io/fstype: xfs
    creationTimestamp: 2018-12-27T15:17:00Z
    name: jiva-1rep
    resourceVersion: "137502"
    selfLink: /apis/storage.k8s.io/v1/storageclasses/jiva-1rep
    uid: 75597ad9-09ea-11e9-a880-060c2990a513
  provisioner: openebs.io/provisioner-iscsi
  reclaimPolicy: Delete
  volumeBindingMode: Immediate

All of my openebs pods are running 0.8.0:

    Image:          quay.io/openebs/cstor-pool:0.8.0
    Image:          quay.io/openebs/cstor-pool-mgmt:0.8.0
    Image:          quay.io/openebs/cstor-pool:0.8.0
    Image:          quay.io/openebs/cstor-pool-mgmt:0.8.0
    Image:          quay.io/openebs/cstor-pool:0.8.0
    Image:          quay.io/openebs/cstor-pool-mgmt:0.8.0
    Image:          quay.io/openebs/cstor-pool:0.8.0
    Image:          quay.io/openebs/cstor-pool-mgmt:0.8.0
    Image:          quay.io/openebs/m-apiserver:0.8.0
    Image:         quay.io/openebs/node-disk-manager-amd64:v0.2.0
    Image:         quay.io/openebs/node-disk-manager-amd64:v0.2.0
    Image:         quay.io/openebs/node-disk-manager-amd64:v0.2.0
    Image:         quay.io/openebs/node-disk-manager-amd64:v0.2.0
    Image:          quay.io/openebs/openebs-k8s-provisioner:0.8.0
    Image:          quay.io/openebs/snapshot-controller:0.8.0
    Image:          quay.io/openebs/snapshot-provisioner:0.8.0

Did I mess up the annotation for FSType somehow? I'm not sure why the disk is still using ext4 if I'm explicitly telling it to use xfs.

utkarshmani1997 commented 5 years ago

@PeterGrace can you also help with providing kubelet logs from the node where it has happened and if you are running your own master then grab the kube-controller-manager log also.

utkarshmani1997 commented 5 years ago

@PeterGrace the issue is with your storage class, you should specify annotation FSType instead of FStype that's why it was not honoring xfs. We have also tried to simulate the (UNEXPECTED INCONSISTENCY) in node-drain scenario multiple times, on three node cluster with 3 replicas using following storage class but we were unable to reproduce it.

Name:            openebs-standard
IsDefaultClass:  No
Annotations:     cas.openebs.io/config=- name: ReplicaCount
  value: "3"
- name: FSType
  value: "xfs"

Provisioner:           openebs.io/provisioner-iscsi
Parameters:            <none>
AllowVolumeExpansion:  <unset>
MountOptions:          <none>
ReclaimPolicy:         Delete
VolumeBindingMode:     Immediate
Events:                <none>

kubectl describe pvc:

Name:          mongo-jiva-claim-mongo-0
Namespace:     default
StorageClass:  openebs-standard
Status:        Bound
Volume:        pvc-0e05b7c2-237d-11e9-b26f-06f90e7ebe0a
Labels:        environment=test
               openebs.io/replica-anti-affinity=vehicle-db
               role=mongo
Annotations:   pv.kubernetes.io/bind-completed=yes
               pv.kubernetes.io/bound-by-controller=yes
               volume.beta.kubernetes.io/storage-provisioner=openebs.io/provisioner-iscsi
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      2G
Access Modes:  RWO
Events:        <none>

If you find this issue again please get us the following logs:

kubectl get sc
kubectl get pv
kubectl get pvc
journalctl -u kubelet  (from the node where volume is getting mounted and also from the other node where it was mounted earlier)
dmesg (from the both nodes as mentioned above)
kubectl logs <ctrl-pods>
kmova commented 4 years ago

Need help in reproducing this issue. Keeping the issue open as couple of users have hit it.

github-actions[bot] commented 4 years ago

Issues go stale after 90d of inactivity.