Closed davidkarlsen closed 5 months ago
Hi @davidkarlsen Can you please tell us the environment details like LVM-driver version, k8s version and OS ?
Hi @davidkarlsen Can you please tell us the environment details like LVM-driver version, k8s version and OS ?
lvm version
LVM version: 2.02.187(2)-RHEL7 (2020-03-24)
Library version: 1.02.170-RHEL7 (2020-03-24)
Driver version: 4.37.1
Configuration: ./configure --build=x86_64-redhat-linux-gnu --host=x86_64-redhat-linux-gnu --program-prefix= --disable-dependency-tracking --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64 --libexecdir=/usr/libexec --localstatedir=/var --sharedstatedir=/var/lib --mandir=/usr/share/man --infodir=/usr/share/info --with-default-dm-run-dir=/run --with-default-run-dir=/run/lvm --with-default-pid-dir=/run --with-default-locking-dir=/run/lock/lvm --with-usrlibdir=/usr/lib64 --enable-lvm1_fallback --enable-fsadm --with-pool=internal --enable-write_install --with-user= --with-group= --with-device-uid=0 --with-device-gid=6 --with-device-mode=0660 --enable-pkgconfig --enable-applib --enable-cmdlib --enable-dmeventd --enable-blkid_wiping --enable-python2-bindings --with-cluster=internal --with-clvmd=corosync --enable-cmirrord --with-udevdir=/usr/lib/udev/rules.d --enable-udev_sync --with-thin=internal --enable-lvmetad --with-cache=internal --enable-lvmpolld --enable-lvmlockd-dlm --enable-lvmlockd-sanlock --enable-dmfilemapd
uname -a
Linux alp-dts-g-c01oco07 3.10.0-1160.36.2.el7.x86_64 #1 SMP Thu Jul 8 02:53:40 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.9 (Maipo)
openebs helm chart 2.12.0
@davidkarlsen -- this looks related to #75.
Would it be possible to try this on RHEL 8?
It looks the same. Unfortunately I can't run on RHEL8 as it's not supported for OCP . In my cases I had just deleted some LVs, then created a new one, which probably landed at the same offset, so w/o clearing the old volume it will probably find a magic superblock and avoid formatting w/o using force - so in order to compare rhel7/8 that should be the underlying setting.
Can wiping and zeroing be controlled when the volumes are created? I'd recommend having both enabled by default.
@davidkarlsen that was the plannd item for LVM LocalPV. We already wipe the lvm partition when we delete the volume. From the error it looks like you already had some partition before and volume landed at the same offset. We need to clear the fs at the creation time also. We had planned this and somehow missed implementing it. Will take care of adding this enhancement.
Note that the safest is to do wipe at create too.
@davidkarlsen I have raised a PR(https://github.com/openebs/lvm-localpv/pull/138) to fix it. Can you try with the image pawanpraka1/lvm-driver:vp
and see if it is working.
@davidkarlsen can you confirm the lvm driver version you are using? It should be there in beginning of the openebs-lvm-plugin container log in the openebs-lvm-node-xxxx daemonset.
This behavior is due to compatibility issues between the container and the host operating system. openebs/lvm-localpv 0.6.0 version is already erasing the fs signatures on LVM volume before creating the volume. Fix was merged via #88 . This issue can be reproduced by performing the following steps:
@davidkarlsen can you confirm the lvm driver version you are using? It should be there in beginning of the openebs-lvm-plugin container log in the openebs-lvm-node-xxxx daemonset.
LVM Driver Version :- 0.8.0 - commit :- 929ae4439f2da71a2d6ee5bda6a33dd2f7d424fc
This behavior is due to compatibility issues between the container and the host operating system. openebs/lvm-localpv 0.6.0 version is already erasing the fs signatures on LVM volume before creating the volume. Fix was merged via #88 . This issue can be reproduced by performing the following steps:
Hmm, then howcome I experience this problem with 0.8.0? BTW, when you format, do you pass the -f (force) option?
This behavior is due to compatibility issues between the container and the host operating system. openebs/lvm-localpv 0.6.0 version is already erasing the fs signatures on LVM volume before creating the volume. Fix was merged via #88 . This issue can be reproduced by performing the following steps:
Hmm, then howcome I experience this problem with 0.8.0? BTW, when you format, do you pass the -f (force) option?
Yes, we are passing -f(force) option from 0.6.0 version onwards
This behavior is due to compatibility issues between the container and the host operating system. openebs/lvm-localpv 0.6.0 version is already erasing the fs signatures on LVM volume before creating the volume. Fix was merged via #88 . This issue can be reproduced by performing the following steps:
Hmm, then howcome I experience this problem with 0.8.0? BTW, when you format, do you pass the -f (force) option?
Yes, we are passing -f(force) option from 0.6.0 version onwards
Then it's a bit surprising to meet this in the current release for two reasons:
I'll try to provoke this in a third cluster when I have time.
Tried now with 2.12.2 chart, still same:
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 3m20s default-scheduler 0/18 nodes are available: 3 Insufficient memory, 3 node(s) had taint {fsapplog: }, that the pod didn't tolerate, 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 4 node(s) didn't match Pod's node affinity/selector, 5 node(s) had taint {fss.tietoevry.com/finods-group: }, that the pod didn't tolerate.
Warning FailedScheduling 3m18s default-scheduler 0/18 nodes are available: 3 Insufficient memory, 3 node(s) had taint {fsapplog: }, that the pod didn't tolerate, 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 4 node(s) didn't match Pod's node affinity/selector, 5 node(s) had taint {fss.tietoevry.com/finods-group: }, that the pod didn't tolerate.
Normal Scheduled 3m4s default-scheduler Successfully assigned openshift-logging/elasticsearch-cdm-cqg8zvqd-1-5596fc5479-7lmtg to alp-ksx-c01oco05
Warning FailedMount 62s kubelet Unable to attach or mount volumes: unmounted volumes=[elasticsearch-storage], unattached volumes=[kube-api-access-29pgd elasticsearch-metrics elasticsearch-storage elasticsearch-config certificates]: timed out waiting for the condition
Warning FailedMount 57s (x9 over 3m5s) kubelet MountVolume.SetUp failed for volume "pvc-5128b42c-a7c1-403b-b599-2cadf8984328" : rpc error: code = Internal desc = failed to format and mount the volume error: mount failed: exit status 32
Mounting command: mount
Mounting arguments: -t xfs -o defaults /dev/datavg/pvc-5128b42c-a7c1-403b-b599-2cadf8984328 /var/lib/kubelet/pods/b7c50bae-72a1-4ae5-9c0d-e23b8e84a5b3/volumes/kubernetes.io~csi/pvc-5128b42c-a7c1-403b-b599-2cadf8984328/mount
Output: mount: /var/lib/kubelet/pods/b7c50bae-72a1-4ae5-9c0d-e23b8e84a5b3/volumes/kubernetes.io~csi/pvc-5128b42c-a7c1-403b-b599-2cadf8984328/mount: wrong fs type, bad option, bad superblock on /dev/mapper/datavg-pvc--5128b42c--a7c1--403b--b599--2cadf8984328, missing codepage or helper program, or other error.
same problem on 2.12.5
From the logs:
I0909 20:35:40.848768 1 grpc.go:72] GRPC call: /csi.v1.Node/NodePublishVolume requests {"target_path":"/var/lib/kubelet/pods/179a5e86-43a5-43f7-b78e-b11af4368674/volumes/kubernetes.io~csi/pvc-d5be05a4-f5f8-4b7e-83b3-b53eaaff8937/mount","volume_capa
bility":{"AccessType":{"Mount":{"fs_type":"xfs"}},"access_mode":{"mode":1}},"volume_context":{"csi.storage.k8s.io/ephemeral":"false","csi.storage.k8s.io/pod.name":"prometheus-k8s-1","csi.storage.k8s.io/pod.namespace":"openshift-monitoring","csi.storage.k
8s.io/pod.uid":"179a5e86-43a5-43f7-b78e-b11af4368674","csi.storage.k8s.io/serviceAccount.name":"prometheus-k8s","openebs.io/cas-type":"localpv-lvm","openebs.io/volgroup":"datavg","storage.kubernetes.io/csiProvisionerIdentity":"1631215660348-8081-local.cs
i.openebs.io"},"volume_id":"pvc-d5be05a4-f5f8-4b7e-83b3-b53eaaff8937"}
I0909 20:35:40.864001 1 mount_linux.go:366] Disk "/dev/datavg/pvc-d5be05a4-f5f8-4b7e-83b3-b53eaaff8937" appears to be unformatted, attempting to format as type: "xfs" with options: [/dev/datavg/pvc-d5be05a4-f5f8-4b7e-83b3-b53eaaff8937]
I0909 20:35:41.646181 1 mount_linux.go:376] Disk successfully formatted (mkfs): xfs - /dev/datavg/pvc-d5be05a4-f5f8-4b7e-83b3-b53eaaff8937 /var/lib/kubelet/pods/179a5e86-43a5-43f7-b78e-b11af4368674/volumes/kubernetes.io~csi/pvc-d5be05a4-f5f8-4b7e-83b3-b53eaaff8937/mount
E0909 20:35:41.648622 1 mount_linux.go:150] Mount failed: exit status 32
Mounting command: mount
Mounting arguments: -t xfs -o defaults /dev/datavg/pvc-d5be05a4-f5f8-4b7e-83b3-b53eaaff8937 /var/lib/kubelet/pods/179a5e86-43a5-43f7-b78e-b11af4368674/volumes/kubernetes.io~csi/pvc-d5be05a4-f5f8-4b7e-83b3-b53eaaff8937/mount
Output: mount: /var/lib/kubelet/pods/179a5e86-43a5-43f7-b78e-b11af4368674/volumes/kubernetes.io~csi/pvc-d5be05a4-f5f8-4b7e-83b3-b53eaaff8937/mount: wrong fs type, bad option, bad superblock on /dev/mapper/datavg-pvc--d5be05a4--f5f8--4b7e--83b3--b53eaaff8937, missing codepage or helper program, or other error.
note that there is no -f
in:
attempting to format as type: "xfs" with options: [/dev/datavg/pvc-d5be05a4-f5f8-4b7e-83b3-b53eaaff8937]
The issue lies here: https://github.com/kubernetes/mount-utils/pull/5
Looks like even with above force
flag issue is the still the same... When this issue occurred following are the system logs:
Sep 12 18:51:53 centos-master kernel: XFS (dm-0): Superblock has unknown read-only compatible features (0x4) enabled.
Sep 12 18:51:53 centos-master kernel: XFS (dm-0): Attempted to mount read-only compatible filesystem read-write.
Sep 12 18:51:53 centos-master kernel: XFS (dm-0): Filesystem can only be safely mounted read only.
Sep 12 18:51:53 centos-master kernel: XFS (dm-0): SB validate failed with error -22.
Above error -22 leads to EINVAL
which means Invalid Argument(As I understood kernel is not yet supporting the same)... and did some googling around above error takes me to this page.
mkfs.xfs version on centos 7: 4.5.0 mkfs.xfs version on container: 5.6.0 Looks like some incompatibility as mentioned in issue...
To resolve issue we have to format xfs filesystem with following option: 'mkfs.xfs -m reflink=0 /dev/lvm/manual1'
Attepmt1 formated with xfs without using any flags:
bash-5.0# lvcreate -n manual1 -L 1G lvm
Logical volume "manual1" created.
bash-5.0# mkfs.xfs /dev/lvm/manual1
meta-data=/dev/lvm/manual1 isize=512 agcount=4, agsize=65536 blks
= sectsz=512 attr=2, projid32bit=1
= crc=1 finobt=1, sparse=1, rmapbt=0
= reflink=1
data = bsize=4096 blocks=262144, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0, ftype=1
log =internal log bsize=4096 blocks=2560, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
bash-5.0# mount /dev/lvm/manual1 /var/lib/kubelet/mnt/store1
mount: /var/lib/kubelet/mnt/store1: wrong fs type, bad option, bad superblock on /dev/mapper/lvm-manual1, missing codepage or helper program, or other error.
Attepmt2 formated with xfs using -m reflink=0 flag:
bash-5.0# lsblk -fa
NAME FSTYPE FSVER LABEL UUID FSAVAIL FSUSE% MOUNTPOINT
fd0
loop0 squashfs
loop1 squashfs
loop2 squashfs
sda
├─sda1 xfs 8808cf9e-0900-4d7a-af19-36bf061d7a24
└─sda2 xfs 72d0dc49-d80f-4aa8-a51f-51e237deb23e 10.9G 62% /var/lib/kubelet
sdb LVM2_member IvJ3Z4-PaLm-zZ5j-4oxK-H6dS-pkBk-KjcJSG
└─lvm-manual1
sr0
bash-5.0# lvs
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
manual1 lvm -wi-a----- 1.00g
bash-5.0# mkfs.xfs -m reflink=0 /dev/lvm/manual1
meta-data=/dev/lvm/manual1 isize=512 agcount=4, agsize=65536 blks
= sectsz=512 attr=2, projid32bit=1
= crc=1 finobt=1, sparse=1, rmapbt=0
= reflink=0
data = bsize=4096 blocks=262144, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0, ftype=1
log =internal log bsize=4096 blocks=2560, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
bash-5.0# mount /dev/lvm/manual1 /var/lib/kubelet/mnt/store1
bash-5.0#
bash-5.0# df -h
Filesystem Size Used Avail Use% Mounted on
overlay 29G 19G 11G 63% /
tmpfs 1.9G 0 1.9G 0% /sys/fs/cgroup
/dev/sda2 29G 19G 11G 63% /plugin
devtmpfs 1.9G 0 1.9G 0% /dev
shm 64M 0 64M 0% /dev/shm
tmpfs 1.9G 12K 1.9G 1% /var/lib/kubelet/pods/32966bd7-fd41-4f49-b572-8a25a1dc802d/volumes/kubernetes.io~secret/kube-proxy-token-tmnfs
tmpfs 1.9G 12K 1.9G 1% /var/lib/kubelet/pods/8e82a39d-d592-4051-83f2-bb372f568246/volumes/kubernetes.io~secret/flannel-token-fpwlc
tmpfs 1.9G 12K 1.9G 1% /var/lib/kubelet/pods/be4ddd06-bed9-4d18-bb54-26e67c77eb74/volumes/kubernetes.io~secret/openebs-maya-operator-token-sj7w5
tmpfs 1.9G 12K 1.9G 1% /run/secrets/kubernetes.io/serviceaccount
---------------------------------------------------------------------------------------------------------------------
| /dev/mapper/lvm-manual1 1014M 33M 982M 4% /var/lib/kubelet/mnt/store1 |
---------------------------------------------------------------------------------------------------------------------
bash-5.0#
-m reflink=0
flag which say to xfs fs to disables shared copy-on-write feature which is not supported by centos-7(AFAIK)Redhat document which says to pass reflink option
@mittachaitu I believe that's another problem (it has another error-message) - please create a separate issue for that.
mount: /var/lib/kubelet/mnt/store1: wrong fs type, bad option, bad superblock on /dev/mapper/lvm-manual1, missing codepage or helper program, or other error.
Above is the error I got when tried to mount xfs formatted lvm volume and the issue description is also having a simmilar error.. So I belive both are the same...
Mounting command: mount Mounting arguments: -t xfs -o defaults /dev/datavg/pvc-c9073859-fd54-4890-b444-b96e6f46dea1 /var/lib/kubelet/pods/0c34d38c-88f0-4a1c-bf6f-02e6b3ab05cd/volumes/kubernetes.io~csi/pvc-c9073859-fd54-4890-b444-b96e6f46dea1/mount Output: mount: /var/lib/kubelet/pods/0c34d38c-88f0-4a1c-bf6f-02e6b3ab05cd/volumes/kubernetes.io~csi/pvc-c9073859-fd54-4890-b444-b96e6f46dea1/mount: wrong fs type, bad option, bad superblock on /dev/mapper/datavg-pvc--c9073859--fd54--4890--b444--b96e6f46dea1, missing codepage or helper program, or other error.
The above is from issue description
@w3aman could you maybe by any chance pull in my hack on mount_utils? Merging into Kubernetes and waiting for a release will take forever.
A reasonable update at the moment is to mention in our documentation that a combination of xfs and older kernel (< 5.10) may run into this issue and can be mitigated by updated host node kernel version.
What steps did you take and what happened:
beacause:
Maybe it should force by default or some notes added to the docs.
What did you expect to happen: formatting should happen
Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]
Environment:
kubectl version
):/etc/os-release
):