Closed Trolldemorted closed 2 years ago
As a side effect, the pvcs were mounted r/o.
dmesg output:
[Fri Jun 3 11:55:31 2022] EXT4-fs warning (device nvme5n1): ext4_end_bio:309: I/O error 10 writing to inode 18 (offset 0 size 1052672 starting block 2425072)
[Fri Jun 3 11:55:31 2022] Buffer I/O error on device nvme5n1, logical block 2424832
[Fri Jun 3 11:55:31 2022] Buffer I/O error on device nvme5n1, logical block 2424833
[Fri Jun 3 11:55:31 2022] Buffer I/O error on device nvme5n1, logical block 2424834
[Fri Jun 3 11:55:31 2022] Buffer I/O error on device nvme5n1, logical block 2424835
[Fri Jun 3 11:55:31 2022] Buffer I/O error on device nvme5n1, logical block 2424836
[Fri Jun 3 11:55:31 2022] Buffer I/O error on device nvme5n1, logical block 2424837
[Fri Jun 3 11:55:31 2022] Buffer I/O error on device nvme5n1, logical block 2424838
[Fri Jun 3 11:55:31 2022] Buffer I/O error on device nvme5n1, logical block 2424839
[Fri Jun 3 11:55:31 2022] Buffer I/O error on device nvme5n1, logical block 2424840
[Fri Jun 3 11:55:31 2022] Buffer I/O error on device nvme5n1, logical block 2424841
[Fri Jun 3 11:55:31 2022] EXT4-fs warning (device nvme5n1): ext4_end_bio:309: I/O error 10 writing to inode 18 (offset 0 size 2101248 starting block 2425328)
[Fri Jun 3 11:55:31 2022] EXT4-fs warning (device nvme5n1): ext4_end_bio:309: I/O error 10 writing to inode 18 (offset 0 size 3149824 starting block 2425584)
[Fri Jun 3 11:55:31 2022] EXT4-fs warning (device nvme5n1): ext4_end_bio:309: I/O error 10 writing to inode 18 (offset 0 size 4198400 starting block 2425840)
[Fri Jun 3 11:55:31 2022] EXT4-fs warning (device nvme5n1): ext4_end_bio:309: I/O error 10 writing to inode 18 (offset 0 size 5246976 starting block 2426096)
[Fri Jun 3 11:55:31 2022] EXT4-fs warning (device nvme5n1): ext4_end_bio:309: I/O error 10 writing to inode 18 (offset 0 size 6295552 starting block 2426352)
[Fri Jun 3 11:55:31 2022] EXT4-fs warning (device nvme5n1): ext4_end_bio:309: I/O error 10 writing to inode 18 (offset 0 size 7344128 starting block 2426608)
[Fri Jun 3 11:55:31 2022] EXT4-fs warning (device nvme5n1): ext4_end_bio:309: I/O error 10 writing to inode 18 (offset 0 size 8388608 starting block 2426864)
[Fri Jun 3 11:55:31 2022] EXT4-fs warning (device nvme5n1): ext4_end_bio:309: I/O error 10 writing to inode 18 (offset 8388608 size 1052672 starting block 2427120)
[Fri Jun 3 11:55:31 2022] EXT4-fs warning (device nvme5n1): ext4_end_bio:309: I/O error 10 writing to inode 18 (offset 8388608 size 2101248 starting block 2427376)
[Fri Jun 3 11:55:34 2022] Aborting journal on device nvme5n1-8.
[Fri Jun 3 11:55:34 2022] Buffer I/O error on dev nvme5n1, logical block 3702784, lost sync page write
[Fri Jun 3 11:55:34 2022] JBD2: Error -5 detected when updating journal superblock for nvme5n1-8.
[Fri Jun 3 11:55:34 2022] Buffer I/O error on dev nvme5n1, logical block 0, lost sync page write
[Fri Jun 3 11:55:34 2022] EXT4-fs (nvme5n1): I/O error while writing superblock
[Fri Jun 3 11:55:34 2022] EXT4-fs error (device nvme5n1): ext4_journal_check_start:61: Detected aborted journal
[Fri Jun 3 11:55:34 2022] EXT4-fs (nvme5n1): Remounting filesystem read-only
[Fri Jun 3 11:58:26 2022] sr 8:0:0:0: [sr0] CDROM not ready. Make sure there is a disc in the drive.
[Fri Jun 3 11:58:26 2022] sr 8:0:0:0: [sr0] CDROM not ready. Make sure there is a disc in the drive.
[Fri Jun 3 11:58:26 2022] nvme_ns_head_make_request: 674 callbacks suppressed
[Fri Jun 3 11:58:26 2022] block nvme4n1: no available path - failing I/O
[Fri Jun 3 11:58:26 2022] Buffer I/O error on dev nvme4n1, logical block 4, async page read
[Fri Jun 3 11:58:26 2022] block nvme4n1: no available path - failing I/O
[Fri Jun 3 11:58:26 2022] Buffer I/O error on dev nvme4n1, logical block 5, async page read
[Fri Jun 3 11:58:26 2022] block nvme4n1: no available path - failing I/O
[Fri Jun 3 11:58:26 2022] Buffer I/O error on dev nvme4n1, logical block 6, async page read
[Fri Jun 3 11:58:26 2022] block nvme4n1: no available path - failing I/O
[Fri Jun 3 11:58:26 2022] Buffer I/O error on dev nvme4n1, logical block 7, async page read
[Fri Jun 3 11:58:26 2022] block nvme4n1: no available path - failing I/O
[Fri Jun 3 11:58:26 2022] Buffer I/O error on dev nvme4n1, logical block 4, async page read
[Fri Jun 3 11:58:26 2022] block nvme4n1: no available path - failing I/O
[Fri Jun 3 11:58:26 2022] Buffer I/O error on dev nvme4n1, logical block 5, async page read
[Fri Jun 3 11:58:26 2022] block nvme4n1: no available path - failing I/O
[Fri Jun 3 11:58:26 2022] Buffer I/O error on dev nvme4n1, logical block 6, async page read
[Fri Jun 3 11:58:26 2022] block nvme4n1: no available path - failing I/O
[Fri Jun 3 11:58:26 2022] Buffer I/O error on dev nvme4n1, logical block 7, async page read
Is there anything we can do to assert the integrity of our cluster? Can we just remount the PVCs as writable? Should that happen automatically?
I can't quite remember what 0.8.1 used to do but IIRC when the volume target is recreated it will initially contain only 1 of the previous healthy replicas - this is because in case of a crash we don't know if there were any writes in progress. From that 1 replica we then rebuild the other ones (if the volume was configured with replicas >1).
Atm, we do not however handle remounting of the volume as writable automatically.
Can I find out whether the cluster is in a healthy state again? And is there documentation about how to update/restart mayastor nodes without the cluster rebuilding replicas?
mount -o remount,rw /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-5d8925a7-ca3b-49d5-95b2-973c38c74a8e/globalmount mount: /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-5d8925a7-ca3b-49d5-95b2-973c38c74a8e/globalmount: cannot remount /dev/nvme3n1 read-write, is write-protected.
We have tried this to remount our volumes, which did not work. How do we correctly remount our volumes?
[2022-06-07T07:52:50.030479727+00:00 ERROR mayastor::spdk:nvme_ctrlr.c:3523] [nqn.2019-05.io.openebs:d9cb25e4-7985-46a8-9991-2b62868e756a] Submitting Keep Alive failed
Additional information from our mayastor pod on the target node. We get this message multi times per minute.
Can I find out whether the cluster is in a healthy state again? And is there documentation about how to update/restart mayastor nodes without the cluster rebuilding replicas?
This is something we're looking into. For the moment the WA to restart mayastor nodes without rebuilding replicas is to bring down the applications first, then restart the mayastor nodes, then bring the applications back up.
mount -o remount,rw /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-5d8925a7-ca3b-49d5-95b2-973c38c74a8e/globalmount mount: /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-5d8925a7-ca3b-49d5-95b2-973c38c74a8e/globalmount: cannot remount /dev/nvme3n1 read-write, is write-protected.
We have tried this to remount our volumes, which did not work. How do we correctly remount our volumes?
hmm I'm not sure then. Is the device connected properly? Could you run nvme list
?
[2022-06-07T07:52:50.030479727+00:00 ERROR mayastor::spdk:nvme_ctrlr.c:3523] [nqn.2019-05.io.openebs:d9cb25e4-7985-46a8-9991-2b62868e756a] Submitting Keep Alive failed
Additional information from our mayastor pod on the target node. We get this message multi times per minute.
Ah, someone else has seen this a few days ago. Unfortunately when this happened the mayastor was actually in a bad state and they had to restart it. Is that mayastor responsive?
sudo nvme list
output:
node1:
/dev/nvme2n1: failed to obtain nvme info: Operation not permitted
/dev/nvme3n1: failed to obtain nvme info: Operation not permitted
/dev/nvme4n1: failed to obtain nvme info: Operation not permitted
/dev/nvme5n1: failed to obtain nvme info: Operation not permitted
/dev/nvme6n1: failed to obtain nvme info: Operation not permitted
/dev/nvme7n1: failed to obtain nvme info: Operation not permitted
Node SN Model Namespace Usage Format FW Rev
---------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
/dev/nvme0n1 REDACTED SAMSUNG MZWLJ3T8HBLS-00007 1 3.84 TB / 3.84 TB 512 B + 0 B REDACTED
/dev/nvme1n1 REDACTED SAMSUNG MZWLJ3T8HBLS-00007 1 3.84 TB / 3.84 TB 512 B + 0 B REDACTED
node2:
Node SN Model Namespace Usage Format FW Rev
---------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
/dev/nvme0n1 REDACTED SAMSUNG MZWLJ3T8HBLS-00007 1 3.84 TB / 3.84 TB 512 B + 0 B REDACTED
/dev/nvme1n1 REDACTED SAMSUNG MZWLJ3T8HBLS-00007 1 3.84 TB / 3.84 TB 512 B + 0 B REDACTED
node3:
Node SN Model Namespace Usage Format FW Rev
---------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
/dev/nvme0n1 REDACTED SAMSUNG MZWLJ3T8HBLS-00007 1 3.84 TB / 3.84 TB 512 B + 0 B REDACTED
/dev/nvme1n1 REDACTED SAMSUNG MZVLB1T0HBLR-00000 1 315.80 GB / 1.02 TB 512 B + 0 B REDACTED
/dev/nvme2n1 REDACTED Mayastor NVMe controller 1 10.73 GB / 10.73 GB 512 B + 0 B REDACTED
Currently I am observing that node1 is repeatedly destroying its Nexus and rebuilding all children.
apiVersion: "openebs.io/v1alpha1"
kind: MayastorPool
metadata:
name: node1-mayastor-pool
namespace: mayastor
spec:
node: node1
disks: ["/dev/nvme1n1"]
---
apiVersion: "openebs.io/v1alpha1"
kind: MayastorPool
metadata:
name: node2-mayastor-pool
namespace: mayastor
spec:
node: node2
disks: ["/dev/nvme1n1"]
---
apiVersion: "openebs.io/v1alpha1"
kind: MayastorPool
metadata:
name: node3-mayastor-pool
namespace: mayastor
spec:
node: node3
disks: ["/dev/nvme0n1"]
Additional Mayastor Configuration.
After restarting all mayastor pods, with the exception of the etcd pods, I could get one of my applications back running. I did not restart the etcd pod, because we are currently (sorry for that) still running the In-Memory etcd version. What would be the consequences of also restarting these pods? Although my application is running now, I can still see in the mayastor logs (node1), that the nexus is repeatedly destroyed and all children are rebuilt. As of now it seems to have no effect on the test application I could get running. The test application got deployed on node1.
We have thrown away our old installation, and installed a fresh mayastor 1.0.1 - thanks for your help though!
@Trolldemorted btw there's a branch release-1.0.2 which has not been release yet as we're testing, but has quite a few bug fixes. Should have a "proper" release soon, but can't confirm any date yet. It should be usable now if you want to try it, all images on dockerhub, etc.
After restarting all mayastor pods, with the exception of the etcd pods, I could get one of my applications back running. I did not restart the etcd pod, because we are currently (sorry for that) still running the In-Memory etcd version. What would be the consequences of also restarting these pods? Although my application is running now, I can still see in the mayastor logs (node1), that the nexus is repeatedly destroyed and all children are rebuilt. As of now it seems to have no effect on the test application I could get running. The test application got deployed on node1.
Ah, if you're using an in-memory etcd, and that gets restarted we'll use all configuration data, including a very crucial information about where the healthy replicas are. If you want to get around this, you could stop your applications, bring down the control-plane, copy the etcd data, redeploy etcd (use "manual" storageclass for example, which although not ideal is better than ram..), write the config back to etcd, bring up the control-plane and your apps.
@Trolldemorted btw there's a branch release-1.0.2 which has not been release yet as we're testing, but has quite a few bug fixes. Should have a "proper" release soon, but can't confirm any date yet. It should be usable now if you want to try it, all images on dockerhub, etc.
Thanks for the heads-up, I think we'll wait for the final 1.0.2 release, otherwise our users might get grumpy - but we are eager to roll out the update once it is finished and the documentation tells us how to upgrade a cluster!
Ah, if you're using an in-memory etcd, and that gets restarted we'll use all configuration data, including a very crucial information about where the healthy replicas are. If you want to get around this, you could stop your applications, bring down the control-plane, copy the etcd data, redeploy etcd (use "manual" storageclass for example, which although not ideal is better than ram..), write the config back to etcd, bring up the control-plane and your apps.
Thanks again for the details, we went the simple "backup the contents of the data and fill your pvcs again" way, which also worked smoothly. New etcd is running in a manual storage class just like your example here, so we are in a significantly better shape now than before :)
Describe the bug One of our mayastor containers crashed:
To Reproduce n/a I'd rather never reproduce this ever again
Expected behavior A clear and concise description of what you expected to happen.
OS info (please complete the following information):