openebs / lvm-localpv

Dynamically provision Stateful Persistent Node-Local Volumes & Filesystems for Kubernetes that is integrated with a backend LVM2 data storage stack.
Apache License 2.0
260 stars 99 forks source link

fails to format #135

Closed davidkarlsen closed 5 months ago

davidkarlsen commented 3 years ago

What steps did you take and what happened:

   ----              ----               -------
  Warning  FailedScheduling  6m2s              default-scheduler  running PreBind plugin "VolumeBinding": binding volumes: selectedNode annotation reset for PVC "elasticsearch-elasticsearch-cdm-4qo1qel7-1"
  Normal   Scheduled         16s               default-scheduler  Successfully assigned openshift-logging/elasticsearch-cdm-4qo1qel7-1-6db94d4d88-lwtv7 to alp-dts-g-c01oco09
  Warning  FailedMount       5s (x5 over 13s)  kubelet            MountVolume.SetUp failed for volume "pvc-c9073859-fd54-4890-b444-b96e6f46dea1" : rpc error: code = Internal desc = failed to format and mount the volume error: mount failed: exit status 32
Mounting command: mount
Mounting arguments: -t xfs -o defaults /dev/datavg/pvc-c9073859-fd54-4890-b444-b96e6f46dea1 /var/lib/kubelet/pods/0c34d38c-88f0-4a1c-bf6f-02e6b3ab05cd/volumes/kubernetes.io~csi/pvc-c9073859-fd54-4890-b444-b96e6f46dea1/mount
Output: mount: /var/lib/kubelet/pods/0c34d38c-88f0-4a1c-bf6f-02e6b3ab05cd/volumes/kubernetes.io~csi/pvc-c9073859-fd54-4890-b444-b96e6f46dea1/mount: wrong fs type, bad option, bad superblock on /dev/mapper/datavg-pvc--c9073859--fd54--4890--b444--b96e6f46dea1, missing codepage or helper program, or other error.

beacause:

 mkfs.xfs /dev/datavg/pvc-c9073859-fd54-4890-b444-b96e6f46dea1
mkfs.xfs: /dev/datavg/pvc-c9073859-fd54-4890-b444-b96e6f46dea1 appears to contain an existing filesystem (xfs).
mkfs.xfs: Use the -f option to force overwrite.

Maybe it should force by default or some notes added to the docs.

What did you expect to happen: formatting should happen

Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]

Environment:

w3aman commented 3 years ago

Hi @davidkarlsen Can you please tell us the environment details like LVM-driver version, k8s version and OS ?

davidkarlsen commented 3 years ago

Hi @davidkarlsen Can you please tell us the environment details like LVM-driver version, k8s version and OS ?

lvm version
  LVM version:     2.02.187(2)-RHEL7 (2020-03-24)
  Library version: 1.02.170-RHEL7 (2020-03-24)
  Driver version:  4.37.1
  Configuration:   ./configure --build=x86_64-redhat-linux-gnu --host=x86_64-redhat-linux-gnu --program-prefix= --disable-dependency-tracking --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64 --libexecdir=/usr/libexec --localstatedir=/var --sharedstatedir=/var/lib --mandir=/usr/share/man --infodir=/usr/share/info --with-default-dm-run-dir=/run --with-default-run-dir=/run/lvm --with-default-pid-dir=/run --with-default-locking-dir=/run/lock/lvm --with-usrlibdir=/usr/lib64 --enable-lvm1_fallback --enable-fsadm --with-pool=internal --enable-write_install --with-user= --with-group= --with-device-uid=0 --with-device-gid=6 --with-device-mode=0660 --enable-pkgconfig --enable-applib --enable-cmdlib --enable-dmeventd --enable-blkid_wiping --enable-python2-bindings --with-cluster=internal --with-clvmd=corosync --enable-cmirrord --with-udevdir=/usr/lib/udev/rules.d --enable-udev_sync --with-thin=internal --enable-lvmetad --with-cache=internal --enable-lvmpolld --enable-lvmlockd-dlm --enable-lvmlockd-sanlock --enable-dmfilemapd
uname -a
Linux alp-dts-g-c01oco07 3.10.0-1160.36.2.el7.x86_64 #1 SMP Thu Jul 8 02:53:40 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 7.9 (Maipo)

openebs helm chart 2.12.0

kmova commented 3 years ago

@davidkarlsen -- this looks related to #75.

Would it be possible to try this on RHEL 8?

davidkarlsen commented 3 years ago

It looks the same. Unfortunately I can't run on RHEL8 as it's not supported for OCP . In my cases I had just deleted some LVs, then created a new one, which probably landed at the same offset, so w/o clearing the old volume it will probably find a magic superblock and avoid formatting w/o using force - so in order to compare rhel7/8 that should be the underlying setting.

davidkarlsen commented 3 years ago

Can wiping and zeroing be controlled when the volumes are created? I'd recommend having both enabled by default.

pawanpraka1 commented 3 years ago

@davidkarlsen that was the plannd item for LVM LocalPV. We already wipe the lvm partition when we delete the volume. From the error it looks like you already had some partition before and volume landed at the same offset. We need to clear the fs at the creation time also. We had planned this and somehow missed implementing it. Will take care of adding this enhancement.

davidkarlsen commented 3 years ago

Note that the safest is to do wipe at create too.

pawanpraka1 commented 3 years ago

@davidkarlsen I have raised a PR(https://github.com/openebs/lvm-localpv/pull/138) to fix it. Can you try with the image pawanpraka1/lvm-driver:vp and see if it is working.

pawanpraka1 commented 3 years ago

@davidkarlsen can you confirm the lvm driver version you are using? It should be there in beginning of the openebs-lvm-plugin container log in the openebs-lvm-node-xxxx daemonset.

mittachaitu commented 3 years ago

This behavior is due to compatibility issues between the container and the host operating system. openebs/lvm-localpv 0.6.0 version is already erasing the fs signatures on LVM volume before creating the volume. Fix was merged via #88 . This issue can be reproduced by performing the following steps:

davidkarlsen commented 3 years ago

@davidkarlsen can you confirm the lvm driver version you are using? It should be there in beginning of the openebs-lvm-plugin container log in the openebs-lvm-node-xxxx daemonset.

LVM Driver Version :- 0.8.0 - commit :- 929ae4439f2da71a2d6ee5bda6a33dd2f7d424fc

davidkarlsen commented 3 years ago

This behavior is due to compatibility issues between the container and the host operating system. openebs/lvm-localpv 0.6.0 version is already erasing the fs signatures on LVM volume before creating the volume. Fix was merged via #88 . This issue can be reproduced by performing the following steps:

Hmm, then howcome I experience this problem with 0.8.0? BTW, when you format, do you pass the -f (force) option?

mittachaitu commented 3 years ago

This behavior is due to compatibility issues between the container and the host operating system. openebs/lvm-localpv 0.6.0 version is already erasing the fs signatures on LVM volume before creating the volume. Fix was merged via #88 . This issue can be reproduced by performing the following steps:

Hmm, then howcome I experience this problem with 0.8.0? BTW, when you format, do you pass the -f (force) option?

Yes, we are passing -f(force) option from 0.6.0 version onwards

davidkarlsen commented 3 years ago

This behavior is due to compatibility issues between the container and the host operating system. openebs/lvm-localpv 0.6.0 version is already erasing the fs signatures on LVM volume before creating the volume. Fix was merged via #88 . This issue can be reproduced by performing the following steps:

Hmm, then howcome I experience this problem with 0.8.0? BTW, when you format, do you pass the -f (force) option?

Yes, we are passing -f(force) option from 0.6.0 version onwards

Then it's a bit surprising to meet this in the current release for two reasons:

  1. If volumes are wiped at creation the superblock should be wiped in the first place and the bug should not surface
  2. If force formatting it should ignore it and pass anyway

I'll try to provoke this in a third cluster when I have time.

davidkarlsen commented 3 years ago

Tried now with 2.12.2 chart, still same:

                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason            Age                 From               Message
  ----     ------            ----                ----               -------
  Warning  FailedScheduling  3m20s               default-scheduler  0/18 nodes are available: 3 Insufficient memory, 3 node(s) had taint {fsapplog: }, that the pod didn't tolerate, 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 4 node(s) didn't match Pod's node affinity/selector, 5 node(s) had taint {fss.tietoevry.com/finods-group: }, that the pod didn't tolerate.
  Warning  FailedScheduling  3m18s               default-scheduler  0/18 nodes are available: 3 Insufficient memory, 3 node(s) had taint {fsapplog: }, that the pod didn't tolerate, 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 4 node(s) didn't match Pod's node affinity/selector, 5 node(s) had taint {fss.tietoevry.com/finods-group: }, that the pod didn't tolerate.
  Normal   Scheduled         3m4s                default-scheduler  Successfully assigned openshift-logging/elasticsearch-cdm-cqg8zvqd-1-5596fc5479-7lmtg to alp-ksx-c01oco05
  Warning  FailedMount       62s                 kubelet            Unable to attach or mount volumes: unmounted volumes=[elasticsearch-storage], unattached volumes=[kube-api-access-29pgd elasticsearch-metrics elasticsearch-storage elasticsearch-config certificates]: timed out waiting for the condition
  Warning  FailedMount       57s (x9 over 3m5s)  kubelet            MountVolume.SetUp failed for volume "pvc-5128b42c-a7c1-403b-b599-2cadf8984328" : rpc error: code = Internal desc = failed to format and mount the volume error: mount failed: exit status 32
Mounting command: mount
Mounting arguments: -t xfs -o defaults /dev/datavg/pvc-5128b42c-a7c1-403b-b599-2cadf8984328 /var/lib/kubelet/pods/b7c50bae-72a1-4ae5-9c0d-e23b8e84a5b3/volumes/kubernetes.io~csi/pvc-5128b42c-a7c1-403b-b599-2cadf8984328/mount
Output: mount: /var/lib/kubelet/pods/b7c50bae-72a1-4ae5-9c0d-e23b8e84a5b3/volumes/kubernetes.io~csi/pvc-5128b42c-a7c1-403b-b599-2cadf8984328/mount: wrong fs type, bad option, bad superblock on /dev/mapper/datavg-pvc--5128b42c--a7c1--403b--b599--2cadf8984328, missing codepage or helper program, or other error.
davidkarlsen commented 3 years ago

same problem on 2.12.5

davidkarlsen commented 3 years ago

From the logs:

I0909 20:35:40.848768       1 grpc.go:72] GRPC call: /csi.v1.Node/NodePublishVolume requests {"target_path":"/var/lib/kubelet/pods/179a5e86-43a5-43f7-b78e-b11af4368674/volumes/kubernetes.io~csi/pvc-d5be05a4-f5f8-4b7e-83b3-b53eaaff8937/mount","volume_capa
bility":{"AccessType":{"Mount":{"fs_type":"xfs"}},"access_mode":{"mode":1}},"volume_context":{"csi.storage.k8s.io/ephemeral":"false","csi.storage.k8s.io/pod.name":"prometheus-k8s-1","csi.storage.k8s.io/pod.namespace":"openshift-monitoring","csi.storage.k
8s.io/pod.uid":"179a5e86-43a5-43f7-b78e-b11af4368674","csi.storage.k8s.io/serviceAccount.name":"prometheus-k8s","openebs.io/cas-type":"localpv-lvm","openebs.io/volgroup":"datavg","storage.kubernetes.io/csiProvisionerIdentity":"1631215660348-8081-local.cs
i.openebs.io"},"volume_id":"pvc-d5be05a4-f5f8-4b7e-83b3-b53eaaff8937"}
I0909 20:35:40.864001       1 mount_linux.go:366] Disk "/dev/datavg/pvc-d5be05a4-f5f8-4b7e-83b3-b53eaaff8937" appears to be unformatted, attempting to format as type: "xfs" with options: [/dev/datavg/pvc-d5be05a4-f5f8-4b7e-83b3-b53eaaff8937]
I0909 20:35:41.646181       1 mount_linux.go:376] Disk successfully formatted (mkfs): xfs - /dev/datavg/pvc-d5be05a4-f5f8-4b7e-83b3-b53eaaff8937 /var/lib/kubelet/pods/179a5e86-43a5-43f7-b78e-b11af4368674/volumes/kubernetes.io~csi/pvc-d5be05a4-f5f8-4b7e-83b3-b53eaaff8937/mount
E0909 20:35:41.648622       1 mount_linux.go:150] Mount failed: exit status 32
Mounting command: mount
Mounting arguments: -t xfs -o defaults /dev/datavg/pvc-d5be05a4-f5f8-4b7e-83b3-b53eaaff8937 /var/lib/kubelet/pods/179a5e86-43a5-43f7-b78e-b11af4368674/volumes/kubernetes.io~csi/pvc-d5be05a4-f5f8-4b7e-83b3-b53eaaff8937/mount
Output: mount: /var/lib/kubelet/pods/179a5e86-43a5-43f7-b78e-b11af4368674/volumes/kubernetes.io~csi/pvc-d5be05a4-f5f8-4b7e-83b3-b53eaaff8937/mount: wrong fs type, bad option, bad superblock on /dev/mapper/datavg-pvc--d5be05a4--f5f8--4b7e--83b3--b53eaaff8937, missing codepage or helper program, or other error.

note that there is no -f in: attempting to format as type: "xfs" with options: [/dev/datavg/pvc-d5be05a4-f5f8-4b7e-83b3-b53eaaff8937]

davidkarlsen commented 3 years ago

The issue lies here: https://github.com/kubernetes/mount-utils/pull/5

davidkarlsen commented 3 years ago

https://github.com/kubernetes/kubernetes/pull/104923

mittachaitu commented 3 years ago

Looks like even with above force flag issue is the still the same... When this issue occurred following are the system logs:

Sep 12 18:51:53 centos-master kernel: XFS (dm-0): Superblock has unknown read-only compatible features (0x4) enabled.
Sep 12 18:51:53 centos-master kernel: XFS (dm-0): Attempted to mount read-only compatible filesystem read-write.
Sep 12 18:51:53 centos-master kernel: XFS (dm-0): Filesystem can only be safely mounted read only.
Sep 12 18:51:53 centos-master kernel: XFS (dm-0): SB validate failed with error -22.

Above error -22 leads to EINVAL which means Invalid Argument(As I understood kernel is not yet supporting the same)... and did some googling around above error takes me to this page.

mkfs.xfs version on centos 7: 4.5.0 mkfs.xfs version on container: 5.6.0 Looks like some incompatibility as mentioned in issue...

To resolve issue we have to format xfs filesystem with following option: 'mkfs.xfs -m reflink=0 /dev/lvm/manual1'

Attepmt1 formated with xfs without using any flags:

bash-5.0# lvcreate -n manual1 -L 1G lvm
  Logical volume "manual1" created.
bash-5.0# mkfs.xfs /dev/lvm/manual1 
meta-data=/dev/lvm/manual1       isize=512    agcount=4, agsize=65536 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=0
         =                       reflink=1
data     =                       bsize=4096   blocks=262144, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=2560, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
bash-5.0# mount /dev/lvm/manual1 /var/lib/kubelet/mnt/store1
mount: /var/lib/kubelet/mnt/store1: wrong fs type, bad option, bad superblock on /dev/mapper/lvm-manual1, missing codepage or helper program, or other error.

Attepmt2 formated with xfs using -m reflink=0 flag:

bash-5.0# lsblk -fa
NAME          FSTYPE      FSVER LABEL UUID                                   FSAVAIL FSUSE% MOUNTPOINT
fd0                                                                                         
loop0         squashfs                                                                      
loop1         squashfs                                                                      
loop2         squashfs                                                                      
sda                                                                                         
├─sda1        xfs                     8808cf9e-0900-4d7a-af19-36bf061d7a24                  
└─sda2        xfs                     72d0dc49-d80f-4aa8-a51f-51e237deb23e     10.9G    62% /var/lib/kubelet
sdb           LVM2_member             IvJ3Z4-PaLm-zZ5j-4oxK-H6dS-pkBk-KjcJSG                
└─lvm-manual1                                                                               
sr0                                                                                         
bash-5.0# lvs
  LV      VG  Attr       LSize Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  manual1 lvm -wi-a----- 1.00g                                                    
bash-5.0# mkfs.xfs -m reflink=0 /dev/lvm/manual1 
meta-data=/dev/lvm/manual1       isize=512    agcount=4, agsize=65536 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=0
         =                       reflink=0
data     =                       bsize=4096   blocks=262144, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=2560, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
bash-5.0# mount /dev/lvm/manual1 /var/lib/kubelet/mnt/store1
bash-5.0# 
bash-5.0# df -h
Filesystem               Size  Used Avail Use% Mounted on
overlay                   29G   19G   11G  63% /
tmpfs                    1.9G     0  1.9G   0% /sys/fs/cgroup
/dev/sda2                 29G   19G   11G  63% /plugin
devtmpfs                 1.9G     0  1.9G   0% /dev
shm                       64M     0   64M   0% /dev/shm
tmpfs                    1.9G   12K  1.9G   1% /var/lib/kubelet/pods/32966bd7-fd41-4f49-b572-8a25a1dc802d/volumes/kubernetes.io~secret/kube-proxy-token-tmnfs
tmpfs                    1.9G   12K  1.9G   1% /var/lib/kubelet/pods/8e82a39d-d592-4051-83f2-bb372f568246/volumes/kubernetes.io~secret/flannel-token-fpwlc
tmpfs                    1.9G   12K  1.9G   1% /var/lib/kubelet/pods/be4ddd06-bed9-4d18-bb54-26e67c77eb74/volumes/kubernetes.io~secret/openebs-maya-operator-token-sj7w5
tmpfs                    1.9G   12K  1.9G   1% /run/secrets/kubernetes.io/serviceaccount
---------------------------------------------------------------------------------------------------------------------
| /dev/mapper/lvm-manual1 1014M   33M  982M   4% /var/lib/kubelet/mnt/store1              |
---------------------------------------------------------------------------------------------------------------------
bash-5.0# 

Redhat document which says to pass reflink option

davidkarlsen commented 3 years ago

@mittachaitu I believe that's another problem (it has another error-message) - please create a separate issue for that.

mittachaitu commented 3 years ago

mount: /var/lib/kubelet/mnt/store1: wrong fs type, bad option, bad superblock on /dev/mapper/lvm-manual1, missing codepage or helper program, or other error.

Above is the error I got when tried to mount xfs formatted lvm volume and the issue description is also having a simmilar error.. So I belive both are the same...

Mounting command: mount Mounting arguments: -t xfs -o defaults /dev/datavg/pvc-c9073859-fd54-4890-b444-b96e6f46dea1 /var/lib/kubelet/pods/0c34d38c-88f0-4a1c-bf6f-02e6b3ab05cd/volumes/kubernetes.io~csi/pvc-c9073859-fd54-4890-b444-b96e6f46dea1/mount Output: mount: /var/lib/kubelet/pods/0c34d38c-88f0-4a1c-bf6f-02e6b3ab05cd/volumes/kubernetes.io~csi/pvc-c9073859-fd54-4890-b444-b96e6f46dea1/mount: wrong fs type, bad option, bad superblock on /dev/mapper/datavg-pvc--c9073859--fd54--4890--b444--b96e6f46dea1, missing codepage or helper program, or other error.

The above is from issue description

davidkarlsen commented 3 years ago

@w3aman could you maybe by any chance pull in my hack on mount_utils? Merging into Kubernetes and waiting for a release will take forever.

dsharma-dc commented 5 months ago

A reasonable update at the moment is to mention in our documentation that a combination of xfs and older kernel (< 5.10) may run into this issue and can be mitigated by updated host node kernel version.

balaharish7 commented 5 months ago

Documented in PR #448 & PR #451