stormshift / support

This repo should serve as a central source for reporting issues with stormshift
GNU General Public License v3.0
3 stars 0 forks source link

Pod stuck in Container Creating with Net App iSCSI Storage #173

Closed rbo closed 1 month ago

rbo commented 5 months ago

After #172 we pods are stuck in Container Creating:

AttachVolume.Attach failed for volume "pvc-e9466bba-f5ce-44df-bd92-73cb149784ad" : rpc error: code = Unknown desc = error publishing ontap-san-economy driver: error adding IQN iqn.1994-05.com.redhat:427913a2110 to igroup : error adding IQN iqn.1994-05.com.redhat:427913a2110 to igroup : API status: failed, Reason: Invalid value specified for "initiator-group-name" element within "igroup-add": ""., Code: 13115
rbo commented 5 months ago

iSCSI initiatorname

Node initiatorname
node/inf4 iqn.1994-05.com.redhat:461a7e93bdc
node/inf44 iqn.1994-05.com.redhat:7ca623189291
node/inf5 iqn.1994-05.com.redhat:70c628ecb0e8
node/inf6 iqn.1994-05.com.redhat:a3f4e45cf2ae
node/inf7 iqn.1994-05.com.redhat:565b28b44ce3
node/inf8 iqn.1994-05.com.redhat:427913a2110
node/ucs56 iqn.1994-05.com.redhat:43966bee5f92
node/ucs57 iqn.1994-05.com.redhat:9d79fb87e6
Details ```bash for i in $(oc get nodes -o name ) ; do echo "# $i"; oc debug $i -- cat /host/etc/iscsi/initiatorname.iscsi ; done # node/inf4 Starting pod/inf4-debug-t4lwr ... To use host binaries, run `chroot /host` InitiatorName=iqn.1994-05.com.redhat:461a7e93bdc Removing debug pod ... # node/inf44 Starting pod/inf44-debug-cl9gt ... To use host binaries, run `chroot /host` InitiatorName=iqn.1994-05.com.redhat:7ca623189291 Removing debug pod ... # node/inf5 Starting pod/inf5-debug-mvp22 ... To use host binaries, run `chroot /host` InitiatorName=iqn.1994-05.com.redhat:70c628ecb0e8 Removing debug pod ... # node/inf6 Starting pod/inf6-debug-5n2hc ... To use host binaries, run `chroot /host` InitiatorName=iqn.1994-05.com.redhat:a3f4e45cf2ae Removing debug pod ... # node/inf7 Starting pod/inf7-debug-lkbhx ... To use host binaries, run `chroot /host` InitiatorName=iqn.1994-05.com.redhat:565b28b44ce3 Removing debug pod ... # node/inf8 Starting pod/inf8-debug-kkh2v ... To use host binaries, run `chroot /host` InitiatorName=iqn.1994-05.com.redhat:427913a2110 Removing debug pod ... # node/ucs56 Starting pod/ucs56-debug-stxw7 ... To use host binaries, run `chroot /host` InitiatorName=iqn.1994-05.com.redhat:43966bee5f92 Removing debug pod ... # node/ucs57 Starting pod/ucs57-debug-c7mq6 ... To use host binaries, run `chroot /host` InitiatorName=iqn.1994-05.com.redhat:9d79fb87e6 Removing debug pod ... ``
rbo commented 5 months ago

Trident controller logs:

oc logs trident-controller-58d89fb6bb-vxzgp | tail
Defaulted container "trident-main" out of: trident-main, trident-autosupport, csi-provisioner, csi-attacher, csi-resizer, csi-snapshotter
time="2024-05-27T14:11:58Z" level=info msg="Publishing volume to node." logLayer=core node=inf7 requestID=d7ce1f4c-98ce-46c7-a2b1-afbffa9cd3fe requestSource=CSI volume=pvc-d3e86f80-77a3-4a4a-a5ed-e1cee486f959 workflow="controller=publish"
time="2024-05-27T14:11:58Z" level=info msg="Publishing volume to node." logLayer=core node=inf8 requestID=432abe78-9e6c-4a42-9bcc-19df67b6b8ea requestSource=CSI volume=pvc-23a91c90-f935-4879-b3f7-ec9f9cb6ef75 workflow="controller=publish"
time="2024-05-27T14:12:00Z" level=error msg="error publishing ontap-san-economy driver: error adding IQN iqn.1994-05.com.redhat:9d79fb87e6 to igroup : error adding IQN iqn.1994-05.com.redhat:9d79fb87e6 to igroup : API status: failed, Reason: Invalid value specified for \"initiator-group-name\" element within \"igroup-add\": \"\"., Code: 13115" Method=ControllerPublishVolume Type=CSI_Controller logLayer=csi_frontend requestID=70cd29c3-47ff-4ae9-920f-c72bb4c62d6b requestSource=CSI workflow="controller=publish"
time="2024-05-27T14:12:00Z" level=error msg="GRPC error: rpc error: code = Unknown desc = error publishing ontap-san-economy driver: error adding IQN iqn.1994-05.com.redhat:9d79fb87e6 to igroup : error adding IQN iqn.1994-05.com.redhat:9d79fb87e6 to igroup : API status: failed, Reason: Invalid value specified for \"initiator-group-name\" element within \"igroup-add\": \"\"., Code: 13115" logLayer=csi_frontend requestID=70cd29c3-47ff-4ae9-920f-c72bb4c62d6b requestSource=CSI
time="2024-05-27T14:12:00Z" level=error msg="error publishing ontap-san-economy driver: error adding IQN iqn.1994-05.com.redhat:427913a2110 to igroup : error adding IQN iqn.1994-05.com.redhat:427913a2110 to igroup : API status: failed, Reason: Invalid value specified for \"initiator-group-name\" element within \"igroup-add\": \"\"., Code: 13115" Method=ControllerPublishVolume Type=CSI_Controller logLayer=csi_frontend requestID=427e7b23-27b2-4b11-8466-df80a88cac33 requestSource=CSI workflow="controller=publish"
time="2024-05-27T14:12:00Z" level=error msg="GRPC error: rpc error: code = Unknown desc = error publishing ontap-san-economy driver: error adding IQN iqn.1994-05.com.redhat:427913a2110 to igroup : error adding IQN iqn.1994-05.com.redhat:427913a2110 to igroup : API status: failed, Reason: Invalid value specified for \"initiator-group-name\" element within \"igroup-add\": \"\"., Code: 13115" logLayer=csi_frontend requestID=427e7b23-27b2-4b11-8466-df80a88cac33 requestSource=CSI
time="2024-05-27T14:12:01Z" level=error msg="error publishing ontap-san-economy driver: error adding IQN iqn.1994-05.com.redhat:565b28b44ce3 to igroup : error adding IQN iqn.1994-05.com.redhat:565b28b44ce3 to igroup : API status: failed, Reason: Invalid value specified for \"initiator-group-name\" element within \"igroup-add\": \"\"., Code: 13115" Method=ControllerPublishVolume Type=CSI_Controller logLayer=csi_frontend requestID=d7ce1f4c-98ce-46c7-a2b1-afbffa9cd3fe requestSource=CSI workflow="controller=publish"
time="2024-05-27T14:12:01Z" level=error msg="GRPC error: rpc error: code = Unknown desc = error publishing ontap-san-economy driver: error adding IQN iqn.1994-05.com.redhat:565b28b44ce3 to igroup : error adding IQN iqn.1994-05.com.redhat:565b28b44ce3 to igroup : API status: failed, Reason: Invalid value specified for \"initiator-group-name\" element within \"igroup-add\": \"\"., Code: 13115" logLayer=csi_frontend requestID=d7ce1f4c-98ce-46c7-a2b1-afbffa9cd3fe requestSource=CSI
time="2024-05-27T14:12:02Z" level=error msg="error publishing ontap-san-economy driver: error adding IQN iqn.1994-05.com.redhat:427913a2110 to igroup : error adding IQN iqn.1994-05.com.redhat:427913a2110 to igroup : API status: failed, Reason: Invalid value specified for \"initiator-group-name\" element within \"igroup-add\": \"\"., Code: 13115" Method=ControllerPublishVolume Type=CSI_Controller logLayer=csi_frontend requestID=432abe78-9e6c-4a42-9bcc-19df67b6b8ea requestSource=CSI workflow="controller=publish"
time="2024-05-27T14:12:02Z" level=error msg="GRPC error: rpc error: code = Unknown desc = error publishing ontap-san-economy driver: error adding IQN iqn.1994-05.com.redhat:427913a2110 to igroup : error adding IQN iqn.1994-05.com.redhat:427913a2110 to igroup : API status: failed, Reason: Invalid value specified for \"initiator-group-name\" element within \"igroup-add\": \"\"., Code: 13115" logLayer=csi_frontend requestID=432abe78-9e6c-4a42-9bcc-19df67b6b8ea requestSource=CSI
rbo commented 5 months ago

Node inf8 has two iscsi sessions and four iscsi devices:

Details ```bash oc debug node/inf8 Starting pod/inf8-debug-9lcdb ... To use host binaries, run `chroot /host` Pod IP: 10.32.96.8 If you don't see a command prompt, try pressing enter. sh-4.4# chroot /host sh-5.1# iscsi sh: iscsi: command not found sh-5.1# iscsi iscsi-iname iscsiadm iscsid iscsistart iscsiuio sh-5.1# iscsiadm Try `iscsiadm --help' for more information. sh-5.1# iscsiadm --help iscsiadm -m discoverydb [-hV] [-d debug_level] [-P printlevel] [-t type -p ip:port -I ifaceN ... [-Dl]] | [[-p ip:port -t type] [-o operation] [-n name] [-v value] [-lD]] iscsiadm -m discovery [-hV] [-d debug_level] [-P printlevel] [-t type -p ip:port -I ifaceN ... [-l]] | [[-p ip:port] [-l | -D]] [-W] iscsiadm -m node [-hV] [-d debug_level] [-P printlevel] [-L all,manual,automatic,onboot] [-W] [-U all,manual,automatic,onboot] [-S] [[-T targetname -p ip:port -I ifaceN] [-l | -u | -R | -s]] [[-o operation ] [-n name] [-v value]] iscsiadm -m session [-hV] [-d debug_level] [-P printlevel] [-r sessionid | sysfsdir [-R | -u | -s] [-o operation] [-n name] [-v value]] iscsiadm -m iface [-hV] [-d debug_level] [-P printlevel] [-I ifacename | -H hostno|MAC] [[-o operation ] [-n name] [-v value]] [-C ping [-a ip] [-b packetsize] [-c count] [-i interval]] iscsiadm -m fw [-d debug_level] [-l] [-W] iscsiadm -m host [-P printlevel] [-H hostno|MAC] [[-C chap [-x chap_tbl_idx]] | [-C flashnode [-A portal_type] [-x flashnode_idx]] | [-C stats]] [[-o operation] [-n name] [-v value]] iscsiadm -k priority sh-5.1# iscsiadm ^C sh-5.1# iscsiadm -m session -P3 iSCSI Transport Class version 2.0-870 version 6.2.1.4 Target: iqn.1992-08.com.netapp:sn.c75b807ffc3c11ec829000a0987cd31a:vs.16 (non-flash) Current Portal: 10.32.97.31:3260,1029 Persistent Portal: 10.32.97.31:3260,1029 ********** Interface: ********** Iface Name: default Iface Transport: tcp Iface Initiatorname: iqn.1994-05.com.redhat:427913a2110 Iface IPaddress: 10.32.96.8 Iface HWaddress: default Iface Netdev: default SID: 1 iSCSI Connection State: LOGGED IN iSCSI Session State: LOGGED_IN Internal iscsid Session State: NO CHANGE ********* Timeouts: ********* Recovery Timeout: 5 Target Reset Timeout: 30 LUN Reset Timeout: 30 Abort Timeout: 15 ***** CHAP: ***** username: password: ******** username_in: password_in: ******** ************************ Negotiated iSCSI params: ************************ HeaderDigest: None DataDigest: None MaxRecvDataSegmentLength: 262144 MaxXmitDataSegmentLength: 65536 FirstBurstLength: 65536 MaxBurstLength: 1048576 ImmediateData: Yes InitialR2T: Yes MaxOutstandingR2T: 1 ************************ Attached SCSI devices: ************************ Host Number: 8 State: running scsi8 Channel 00 Id 0 Lun: 0 Attached scsi disk sde State: running scsi8 Channel 00 Id 0 Lun: 15 Attached scsi disk sdg State: running Current Portal: 10.32.97.32:3260,1030 Persistent Portal: 10.32.97.32:3260,1030 ********** Interface: ********** Iface Name: default Iface Transport: tcp Iface Initiatorname: iqn.1994-05.com.redhat:427913a2110 Iface IPaddress: 10.32.96.8 Iface HWaddress: default Iface Netdev: default SID: 2 iSCSI Connection State: LOGGED IN iSCSI Session State: LOGGED_IN Internal iscsid Session State: NO CHANGE ********* Timeouts: ********* Recovery Timeout: 5 Target Reset Timeout: 30 LUN Reset Timeout: 30 Abort Timeout: 15 ***** CHAP: ***** username: password: ******** username_in: password_in: ******** ************************ Negotiated iSCSI params: ************************ HeaderDigest: None DataDigest: None MaxRecvDataSegmentLength: 262144 MaxXmitDataSegmentLength: 65536 FirstBurstLength: 65536 MaxBurstLength: 1048576 ImmediateData: Yes InitialR2T: Yes MaxOutstandingR2T: 1 ************************ Attached SCSI devices: ************************ Host Number: 9 State: running scsi9 Channel 00 Id 0 Lun: 0 Attached scsi disk sdf State: running scsi9 Channel 00 Id 0 Lun: 15 Attached scsi disk sdh State: running sh-5.1# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS loop0 7:0 0 445.2G 0 loop loop1 7:1 0 445.2G 0 loop loop2 7:2 0 120G 0 loop loop3 7:3 0 64G 0 loop loop4 7:4 0 64G 0 loop sda 8:0 0 445.2G 0 disk sdb 8:16 0 222.6G 0 disk |-sdb1 | 8:17 0 1M 0 part |-sdb2 | 8:18 0 127M 0 part |-sdb3 | 8:19 0 384M 0 part /boot `-sdb4 8:20 0 222.1G 0 part /var/lib/kubelet/pods/4d3283fc-9121-4963-9663-6585ac389c9f/volume-subpaths/ocs-deviceset-local-odf-ssd-0-data-3h744f-bridge/osd/6 /var/lib/kubelet/pods/b5b902f9-58b4-404a-9b20-52381ca63e4d/volume-subpaths/ocs-deviceset-local-odf-ssd-0-data-0xbb26-bridge/osd/6 /var/lib/kubelet/pods/4d3283fc-9121-4963-9663-6585ac389c9f/volume-subpaths/ocs-deviceset-local-odf-ssd-0-data-3h744f-bridge/chown-container-data-dir/6 /var/lib/kubelet/pods/b5b902f9-58b4-404a-9b20-52381ca63e4d/volume-subpaths/ocs-deviceset-local-odf-ssd-0-data-0xbb26-bridge/chown-container-data-dir/6 /var/lib/kubelet/pods/b5b902f9-58b4-404a-9b20-52381ca63e4d/volume-subpaths/ocs-deviceset-local-odf-ssd-0-data-0xbb26-bridge/expand-bluefs/0 /var/lib/kubelet/pods/4d3283fc-9121-4963-9663-6585ac389c9f/volume-subpaths/ocs-deviceset-local-odf-ssd-0-data-3h744f-bridge/expand-bluefs/0 /var/lib/kubelet/pods/b5b902f9-58b4-404a-9b20-52381ca63e4d/volume-subpaths/ocs-deviceset-local-odf-ssd-0-data-0xbb26-bridge/activate/0 /var/lib/kubelet/pods/4d3283fc-9121-4963-9663-6585ac389c9f/volume-subpaths/ocs-deviceset-local-odf-ssd-0-data-3h744f-bridge/activate/0 /var/lib/kubelet/pods/b5b902f9-58b4-404a-9b20-52381ca63e4d/volume-subpaths/ocs-deviceset-local-odf-ssd-0-data-0xbb26-bridge/blkdevmapper/0 /var/lib/kubelet/pods/4d3283fc-9121-4963-9663-6585ac389c9f/volume-subpaths/ocs-deviceset-local-odf-ssd-0-data-3h744f-bridge/blkdevmapper/0 /var /sysroot/ostree/deploy/rhcos/var /sysroot /usr /etc / sdc 8:32 0 445.2G 0 disk |-for--etcd-thin--pool--1_tmeta | 253:0 0 204M 0 lvm | `-for--etcd-thin--pool--1-tpool | 253:2 0 400.3G 0 lvm | |-for--etcd-thin--pool--1 | | 253:3 0 400.3G 1 lvm | |-for--etcd-b236f90e--b161--4c08--ba62--b12a35995eaf | | 253:4 0 4G 0 lvm /var/lib/kubelet/pods/394c01a4-9b62-4d19-98d9-e62ddd0809e3/volumes/kubernetes.io~csi/pvc-0f377532-f2b2-4650-ba2f-9e10e43e56ae/mount | |-for--etcd-7165d559--2ab9--4467--a4d2--c11cf5b0c3ec | | 253:5 0 8G 0 lvm /var/lib/kubelet/pods/358ee59a-632d-4560-a918-d94320f280ae/volumes/kubernetes.io~csi/pvc-11f3e576-8808-4817-a267-fbfd9997028f/mount | `-for--etcd-81a8d4cb--3e60--4e78--bc32--79ce6057d859 | 253:6 0 8G 0 lvm /var/lib/kubelet/pods/57f08e8b-55b3-4f8f-812f-6d166dac29c6/volumes/kubernetes.io~csi/pvc-acd4b9d5-f30b-4873-9124-0b1a3c617f6e/mount `-for--etcd-thin--pool--1_tdata 253:1 0 400.3G 0 lvm `-for--etcd-thin--pool--1-tpool 253:2 0 400.3G 0 lvm |-for--etcd-thin--pool--1 | 253:3 0 400.3G 1 lvm |-for--etcd-b236f90e--b161--4c08--ba62--b12a35995eaf | 253:4 0 4G 0 lvm /var/lib/kubelet/pods/394c01a4-9b62-4d19-98d9-e62ddd0809e3/volumes/kubernetes.io~csi/pvc-0f377532-f2b2-4650-ba2f-9e10e43e56ae/mount |-for--etcd-7165d559--2ab9--4467--a4d2--c11cf5b0c3ec | 253:5 0 8G 0 lvm /var/lib/kubelet/pods/358ee59a-632d-4560-a918-d94320f280ae/volumes/kubernetes.io~csi/pvc-11f3e576-8808-4817-a267-fbfd9997028f/mount `-for--etcd-81a8d4cb--3e60--4e78--bc32--79ce6057d859 253:6 0 8G 0 lvm /var/lib/kubelet/pods/57f08e8b-55b3-4f8f-812f-6d166dac29c6/volumes/kubernetes.io~csi/pvc-acd4b9d5-f30b-4873-9124-0b1a3c617f6e/mount sdd 8:48 0 445.2G 0 disk sde 8:64 0 64G 0 disk `-3600a09803830326d51244a37592f5275 253:7 0 64G 0 mpath |-3600a09803830326d51244a37592f5275p1 | 253:8 0 1G 0 part `-3600a09803830326d51244a37592f5275p2 253:9 0 63G 0 part sdf 8:80 0 64G 0 disk `-3600a09803830326d51244a37592f5275 253:7 0 64G 0 mpath |-3600a09803830326d51244a37592f5275p1 | 253:8 0 1G 0 part `-3600a09803830326d51244a37592f5275p2 253:9 0 63G 0 part sdg 8:96 0 64G 0 disk `-3600a09803830326d51244a37592f5457 253:10 0 64G 0 mpath |-3600a09803830326d51244a37592f5457p1 | 253:11 0 1G 0 part `-3600a09803830326d51244a37592f5457p2 253:12 0 63G 0 part sdh 8:112 0 64G 0 disk `-3600a09803830326d51244a37592f5457 253:10 0 64G 0 mpath |-3600a09803830326d51244a37592f5457p1 | 253:11 0 1G 0 part `-3600a09803830326d51244a37592f5457p2 253:12 0 63G 0 part sr0 11:0 1 1024M 0 rom nbd0 43:0 0 0B 0 disk nbd1 43:32 0 0B 0 disk nbd2 43:64 0 0B 0 disk nbd3 43:96 0 0B 0 disk nbd4 43:128 0 0B 0 disk nbd5 43:160 0 0B 0 disk nbd6 43:192 0 0B 0 disk nbd7 43:224 0 0B 0 disk rbd0 252:0 0 10G 0 disk /var/lib/kubelet/pods/b81c0e78-9399-4257-a8f2-e91b1fa5de98/volume-subpaths/pvc-8c9c5d0b-68bd-45a9-aefe-60cde1535b05/alertmanager/3 /var/lib/kubelet/pods/b81c0e78-9399-4257-a8f2-e91b1fa5de98/volumes/kubernetes.io~csi/pvc-8c9c5d0b-68bd-45a9-aefe-60cde1535b05/mount /var/lib/kubelet/plugins/kubernetes.io/csi/openshift-storage.rbd.csi.ceph.com/f1c4fbd4ee5c6514b bbc2fc2bbff53d31c4175c5bbd6659e8a0e4e2acda2212d/globalmount/0001-0011-openshift-storage-0000000 000000001-6362c252-0652-405a-a82a-39b2526e7f61 rbd1 252:16 0 120G 0 disk /var/lib/kubelet/pods/7555b8c3-f829-437f-bd73-f7c9c2532f67/volume-subpaths/pvc-d7902d3a-881b-4b26-b418-df52c7466029/prometheus/3 /var/lib/kubelet/pods/7555b8c3-f829-437f-bd73-f7c9c2532f67/volumes/kubernetes.io~csi/pvc-d7902d3a-881b-4b26-b418-df52c7466029/mount /var/lib/kubelet/plugins/kubernetes.io/csi/openshift-storage.rbd.csi.ceph.com/acabdd14e6ea5f9ca 33c5c3a479020c1d26706e07480224ba3b5370db3aac951/globalmount/0001-0011-openshift-storage-0000000 000000001-19f2c4a2-c693-4d3b-b7f5-22cda8e1732e rbd2 252:32 0 120G 0 disk |-rbd2p1 | 252:33 0 1M 0 part |-rbd2p2 | 252:34 0 127M 0 part |-rbd2p3 | 252:35 0 384M 0 part `-rbd2p4 252:36 0 119.5G 0 part nbd8 43:256 0 0B 0 disk nbd9 43:288 0 0B 0 disk nbd10 43:320 0 0B 0 disk nbd11 43:352 0 0B 0 disk nbd12 43:384 0 0B 0 disk nbd13 43:416 0 0B 0 disk nbd14 43:448 0 0B 0 disk nbd15 43:480 0 0B 0 disk ```

Let's drain the node and reboot.

rbo commented 5 months ago

Did not solved...

rbo commented 5 months ago
$ oc describe  pod virt-launcher-control-plane-3-qvmp4
...
Events:
  Type     Reason                  Age                   From                     Message
  ----     ------                  ----                  ----                     -------
  Normal   Scheduled               6m8s                  default-scheduler        Successfully assigned demo-cluster-disco/virt-launcher-control-plane-3-qvmp4 to inf8
  Normal   SuccessfulAttachVolume  6m8s                  attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-3cfd18ff-3d27-48d5-831b-7c0e33e6c6fb"
  Warning  FailedAttachVolume      113s (x10 over 6m7s)  attachdetach-controller  AttachVolume.Attach failed for volume "pvc-d3e86f80-77a3-4a4a-a5ed-e1cee486f959" : rpc error: code = Unknown desc = error publishing ontap-san-economy driver: error adding IQN iqn.1994-05.com.redhat:427913a2110 to igroup : error adding IQN iqn.1994-05.com.redhat:427913a2110 to igroup : API status: failed, Reason: Invalid value specified for "initiator-group-name" element within "igroup-add": ""., Code: 13115

$ oc get pv pvc-d3e86f80-77a3-4a4a-a5ed-e1cee486f959 -o yaml | grep internal
      internalName: isar_pvc_d3e86f80_77a3_4a4a_a5ed_e1cee486f959

$ ssh admin@netapp-mgmt.coe.muc.redhat.com
...
fas2552::> lun show -lun isar_pvc_d3e86f80_77a3_4a4a_a5ed_e1cee486f959 -m 
Vserver    Path                                      Igroup   LUN ID  Protocol
---------- ----------------------------------------  -------  ------  --------
svm_trident 
           /vol/trident_lun_pool_isar_ZTIFFWGFPK/isar_pvc_d3e86f80_77a3_4a4a_a5ed_e1cee486f959  
                                                     trident-cd7b267e-7ff1-42ff-a2b5-617216ba06ea  
                                                                  13  iscsi

fas2552::> i group
    ic                  if_addr_filter_info ifconfig
    ifgrp               ifstat              igroup
    ipsec               iscsi               

fas2552::> igroup show -vserver svm_trident -igroup trident-cd7b267e-7ff1-42ff-a2b5-617216ba06ea       
          Vserver Name: svm_trident
           Igroup Name: trident-cd7b267e-7ff1-42ff-a2b5-617216ba06ea
              Protocol: iscsi
               OS Type: linux
Portset Binding Igroup: -
           Igroup UUID: 52c58441-88a7-11ee-ba94-00a0987cd31a
                  ALUA: true
            Initiators: iqn.1994-05.com.redhat:427913a2110 (logged in)
iqn.1994-05.com.redhat:43966bee5f92 (logged in)
iqn.1994-05.com.redhat:461a7e93bdc (logged in)
iqn.1994-05.com.redhat:565b28b44ce3 (logged in)
iqn.1994-05.com.redhat:70c628ecb0e8 (logged in)
iqn.1994-05.com.redhat:7ca623189291 (logged in)
iqn.1994-05.com.redhat:9d79fb87e6 (logged in)
iqn.1994-05.com.redhat:a3f4e45cf2ae (logged in)
iqn.1994-05.com.redhat:c253c1388f7f (not logged in)
rbo commented 5 months ago

Not all iSCSI LUNs are affected.

For example, for example pvc-9509623f-eaa0-449a-8b21-cd69205b25bb, related to ushift08. Migration works fine:

$ oc get vm,vmi,pods -o wide | grep ushift08
virtualmachine.kubevirt.io/ushift08          4d10h   Running   True
virtualmachineinstance.kubevirt.io/ushift08        6h12m   Running   10.32.99.8     inf7       True    True              
pod/virt-launcher-ushift08-88jmv       1/1     Running     0          3m34s   10.128.16.82   inf7    <none>           1/1
pod/virt-launcher-ushift08-d5qzz       0/1     Completed   0          5h3m    10.129.9.94    ucs56   <none>           1/1
$ oc get -o yaml pv pvc-9509623f-eaa0-449a-8b21-cd69205b25bb | grep internal
      internalName: isar_pvc_9509623f_eaa0_449a_8b21_cd69205b25bb

fas2552::> lun show -lun isar_pvc_9509623f_eaa0_449a_8b21_cd69205b25bb -m
Vserver    Path                                      Igroup   LUN ID  Protocol
---------- ----------------------------------------  -------  ------  --------
svm_trident 
           /vol/trident_lun_pool_isar_ZTIFFWGFPK/isar_pvc_9509623f_eaa0_449a_8b21_cd69205b25bb  
                                                     inf7-0f228f1d-6034-47c8-b456-0e13c65e964c  
                                                                   0  iscsi

fas2552::> igroup show -vserver svm_trident -igroup inf7-0f228f1d-6034-47c8-b456-0e13c65e964c
          Vserver Name: svm_trident
           Igroup Name: inf7-0f228f1d-6034-47c8-b456-0e13c65e964c
              Protocol: iscsi
               OS Type: linux
Portset Binding Igroup: -
           Igroup UUID: 5464fa14-1c5e-11ef-b20d-00a0987cd31a
                  ALUA: true
            Initiators: iqn.1994-05.com.redhat:565b28b44ce3 (logged in)

fas2552::> 

There were some igroup changes:

v24.02.0: iSCSI self-healing will now initiate SCSI scans by exact LUN ID if deprecated igroups are in use (Issue #883).

v23.04.0: All ONTAP-SAN-* volumes will now use per-node igroups. LUNs will only be mapped to igroups while actively published to those nodes to improve our security posture. Existing volumes will be opportunistically switched to the new igroup scheme when Trident determines it is safe to do so without impacting active workloads (Issue #758).

Details: https://github.com/NetApp/trident/issues/883

rbo commented 5 months ago

How may PVC are mapped to the igroup trident-cd7b267e-7ff1-42ff-a2b5-617216ba06ea:

$ ssh netapp lun mapping show | grep trident-cd7b267e-7ff1-42ff-a2b5-617216ba06ea
svm_trident /vol/trident_lun_pool_isar_JMKGHLWUOX/isar_pvc_63805303_c5fa_47a8_afc7_515f01e994c0  trident-cd7b267e-7ff1-42ff-a2b5-617216ba06ea  4  iscsi
svm_trident /vol/trident_lun_pool_isar_JMKGHLWUOX/isar_pvc_8c114dd6_26c8_4145_bddd_f0917943254b  trident-cd7b267e-7ff1-42ff-a2b5-617216ba06ea  12  iscsi
svm_trident /vol/trident_lun_pool_isar_JMKGHLWUOX/isar_pvc_e9466bba_f5ce_44df_bd92_73cb149784ad  trident-cd7b267e-7ff1-42ff-a2b5-617216ba06ea  9  iscsi
svm_trident /vol/trident_lun_pool_isar_ZTIFFWGFPK/isar_pvc_23a91c90_f935_4879_b3f7_ec9f9cb6ef75  trident-cd7b267e-7ff1-42ff-a2b5-617216ba06ea  10  iscsi
svm_trident /vol/trident_lun_pool_isar_ZTIFFWGFPK/isar_pvc_899cca95_ce47_4b84_a9a9_eb0b9965f068  trident-cd7b267e-7ff1-42ff-a2b5-617216ba06ea  5  iscsi
svm_trident /vol/trident_lun_pool_isar_ZTIFFWGFPK/isar_pvc_c7537b8a_efac_4d22_8161_f18b4a13287e  trident-cd7b267e-7ff1-42ff-a2b5-617216ba06ea  11  iscsi

$ for pvc in $(ssh netapp lun mapping show | grep trident-cd7b267e-7ff1-42ff-a2b5-617216ba06ea | cut -d'/' -f 4 | cut -f1 -d' ' | sed 's/isar_//' | sed 's/_/-/g'  ) ; do oc get pvc -A | grep $pvc; done
rhbk-operator                           pq-for-rhbk-2-wal                                       Bound    pvc-63805303-c5fa-47a8-afc7-515f01e994c0   3Gi            RWO            coe-netapp-san                               107d
demo-cluster-disco                      worker-1-root                                           Bound    pvc-8c114dd6-26c8-4145-bddd-f0917943254b   120Gi          RWX            coe-netapp-san                               38d
demo-cluster-disco                      control-plane-1-root                                    Bound    pvc-e9466bba-f5ce-44df-bd92-73cb149784ad   120Gi          RWX            coe-netapp-san                               38d
demo-cluster-disco                      control-plane-2-root                                    Bound    pvc-23a91c90-f935-4879-b3f7-ec9f9cb6ef75   120Gi          RWX            coe-netapp-san                               38d
rhbk-operator                           pq-for-rhbk-2                                           Bound    pvc-899cca95-ce47-4b84-a9a9-eb0b9965f068   3Gi            RWO            coe-netapp-san                               107d
demo-cluster-disco                      worker-2-root                                           Bound    pvc-c7537b8a-efac-4d22-8161-f18b4a13287e   120Gi          RWX            coe-netapp-san                               38d

Let's check the Postgress Statefull set of our keycloak/rhbk:

$ oc get pods -n rhbk-operator -l cnpg.io/cluster=pq-for-rhbk
NAME            READY   STATUS                 RESTARTS   AGE
pq-for-rhbk-1   1/1     Running                0          18h
pq-for-rhbk-2   0/1     CreateContainerError   0          38d
pq-for-rhbk-3   1/1     Running                0          18h

=> The trident upgrade was 3 weeks ago, the pods are not that old.

$ oc get pvc -n rhbk-operator
NAME                STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS     AGE
pq-for-rhbk-1       Bound    pvc-aed4c39f-ceb3-4a80-861e-6ba667284eca   3Gi        RWO            coe-netapp-san   107d
pq-for-rhbk-1-wal   Bound    pvc-4fd41748-ccd5-42ad-9f2d-11c3eb7c5401   3Gi        RWO            coe-netapp-san   107d
pq-for-rhbk-2       Bound    pvc-899cca95-ce47-4b84-a9a9-eb0b9965f068   3Gi        RWO            coe-netapp-san   107d
pq-for-rhbk-2-wal   Bound    pvc-63805303-c5fa-47a8-afc7-515f01e994c0   3Gi        RWO            coe-netapp-san   107d
pq-for-rhbk-3       Bound    pvc-b97e4c95-f56b-49f1-94f3-bc95827bdee0   3Gi        RWO            coe-netapp-san   107d
pq-for-rhbk-3-wal   Bound    pvc-3b538c3c-7799-4ad8-9078-53b464a42a76   3Gi        RWO            coe-netapp-san   107d

rhbk PVC -> igroup mapping:

$ for lun in $(oc get pvc -o custom-columns="PV:.spec.volumeName" --no-headers | xargs oc get pv -o yaml | grep internal | cut -f2 -d':' | tr -d ' ' ) ; do echo "Check lun: $lun"; ssh netapp lun show -lun $lun -m ; done;
Check lun: isar_pvc_aed4c39f_ceb3_4a80_861e_6ba667284eca

Last login time: 5/28/2024 09:42:48
Vserver    Path                                      Igroup   LUN ID  Protocol
---------- ----------------------------------------  -------  ------  --------
svm_trident /vol/trident_lun_pool_isar_JMKGHLWUOX/isar_pvc_aed4c39f_ceb3_4a80_861e_6ba667284eca  inf4-0f228f1d-6034-47c8-b456-0e13c65e964c  0  iscsi

Check lun: isar_pvc_4fd41748_ccd5_42ad_9f2d_11c3eb7c5401

Last login time: 5/28/2024 09:43:13
Vserver    Path                                      Igroup   LUN ID  Protocol
---------- ----------------------------------------  -------  ------  --------
svm_trident /vol/trident_lun_pool_isar_ZTIFFWGFPK/isar_pvc_4fd41748_ccd5_42ad_9f2d_11c3eb7c5401  inf4-0f228f1d-6034-47c8-b456-0e13c65e964c  1  iscsi

Check lun: isar_pvc_899cca95_ce47_4b84_a9a9_eb0b9965f068

Last login time: 5/28/2024 09:43:14
Vserver    Path                                      Igroup   LUN ID  Protocol
---------- ----------------------------------------  -------  ------  --------
svm_trident /vol/trident_lun_pool_isar_ZTIFFWGFPK/isar_pvc_899cca95_ce47_4b84_a9a9_eb0b9965f068  trident-cd7b267e-7ff1-42ff-a2b5-617216ba06ea  5  iscsi

Check lun: isar_pvc_63805303_c5fa_47a8_afc7_515f01e994c0

Last login time: 5/28/2024 09:43:14
Vserver    Path                                      Igroup   LUN ID  Protocol
---------- ----------------------------------------  -------  ------  --------
svm_trident /vol/trident_lun_pool_isar_JMKGHLWUOX/isar_pvc_63805303_c5fa_47a8_afc7_515f01e994c0  trident-cd7b267e-7ff1-42ff-a2b5-617216ba06ea  4  iscsi

Check lun: isar_pvc_b97e4c95_f56b_49f1_94f3_bc95827bdee0

Last login time: 5/28/2024 09:43:15
Vserver    Path                                      Igroup   LUN ID  Protocol
---------- ----------------------------------------  -------  ------  --------
svm_trident /vol/trident_lun_pool_isar_ZTIFFWGFPK/isar_pvc_b97e4c95_f56b_49f1_94f3_bc95827bdee0  inf6-0f228f1d-6034-47c8-b456-0e13c65e964c  0  iscsi

Check lun: isar_pvc_3b538c3c_7799_4ad8_9078_53b464a42a76

Last login time: 5/28/2024 09:43:16
Vserver    Path                                      Igroup   LUN ID  Protocol
---------- ----------------------------------------  -------  ------  --------
svm_trident /vol/trident_lun_pool_isar_ZTIFFWGFPK/isar_pvc_3b538c3c_7799_4ad8_9078_53b464a42a76  inf6-0f228f1d-6034-47c8-b456-0e13c65e964c  1  iscsi
rbo commented 5 months ago

Postgres recovered, thanks to the PQ Operator. Other workload terminated.

Wait for final cleanup:

$ for pvc in $(ssh netapp lun mapping show | grep trident-cd7b267e-7ff1-42ff-a2b5-617216ba06ea | cut -d'/' -f 4 | cut -f1 -d' ' | sed 's/isar_//' | sed 's/_/-/g'  ) ; do oc get pvc -A | grep $pvc; done
# No PV/PVC anymore.

$ ssh netapp lun mapping show | grep trident-cd7b267e-7ff1-42ff-a2b5-617216ba06ea
svm_trident /vol/trident_lun_pool_isar_JMKGHLWUOX/isar_pvc_63805303_c5fa_47a8_afc7_515f01e994c0  trident-cd7b267e-7ff1-42ff-a2b5-617216ba06ea  4  iscsi
svm_trident /vol/trident_lun_pool_isar_JMKGHLWUOX/isar_pvc_e9466bba_f5ce_44df_bd92_73cb149784ad  trident-cd7b267e-7ff1-42ff-a2b5-617216ba06ea  9  iscsi
svm_trident /vol/trident_lun_pool_isar_ZTIFFWGFPK/isar_pvc_899cca95_ce47_4b84_a9a9_eb0b9965f068  trident-cd7b267e-7ff1-42ff-a2b5-617216ba06ea  5  iscsi
svm_trident /vol/trident_lun_pool_isar_ZTIFFWGFPK/isar_pvc_c7537b8a_efac_4d22_8161_f18b4a13287e  trident-cd7b267e-7ff1-42ff-a2b5-617216ba06ea  11  **iscsi**
rbo commented 1 month ago

Solved for a long time. We rebuild the postgress cluster. one copy was already on a new share.