Closed rbo closed 1 month ago
Node | initiatorname |
---|---|
node/inf4 | iqn.1994-05.com.redhat:461a7e93bdc |
node/inf44 | iqn.1994-05.com.redhat:7ca623189291 |
node/inf5 | iqn.1994-05.com.redhat:70c628ecb0e8 |
node/inf6 | iqn.1994-05.com.redhat:a3f4e45cf2ae |
node/inf7 | iqn.1994-05.com.redhat:565b28b44ce3 |
node/inf8 | iqn.1994-05.com.redhat:427913a2110 |
node/ucs56 | iqn.1994-05.com.redhat:43966bee5f92 |
node/ucs57 | iqn.1994-05.com.redhat:9d79fb87e6 |
oc logs trident-controller-58d89fb6bb-vxzgp | tail
Defaulted container "trident-main" out of: trident-main, trident-autosupport, csi-provisioner, csi-attacher, csi-resizer, csi-snapshotter
time="2024-05-27T14:11:58Z" level=info msg="Publishing volume to node." logLayer=core node=inf7 requestID=d7ce1f4c-98ce-46c7-a2b1-afbffa9cd3fe requestSource=CSI volume=pvc-d3e86f80-77a3-4a4a-a5ed-e1cee486f959 workflow="controller=publish"
time="2024-05-27T14:11:58Z" level=info msg="Publishing volume to node." logLayer=core node=inf8 requestID=432abe78-9e6c-4a42-9bcc-19df67b6b8ea requestSource=CSI volume=pvc-23a91c90-f935-4879-b3f7-ec9f9cb6ef75 workflow="controller=publish"
time="2024-05-27T14:12:00Z" level=error msg="error publishing ontap-san-economy driver: error adding IQN iqn.1994-05.com.redhat:9d79fb87e6 to igroup : error adding IQN iqn.1994-05.com.redhat:9d79fb87e6 to igroup : API status: failed, Reason: Invalid value specified for \"initiator-group-name\" element within \"igroup-add\": \"\"., Code: 13115" Method=ControllerPublishVolume Type=CSI_Controller logLayer=csi_frontend requestID=70cd29c3-47ff-4ae9-920f-c72bb4c62d6b requestSource=CSI workflow="controller=publish"
time="2024-05-27T14:12:00Z" level=error msg="GRPC error: rpc error: code = Unknown desc = error publishing ontap-san-economy driver: error adding IQN iqn.1994-05.com.redhat:9d79fb87e6 to igroup : error adding IQN iqn.1994-05.com.redhat:9d79fb87e6 to igroup : API status: failed, Reason: Invalid value specified for \"initiator-group-name\" element within \"igroup-add\": \"\"., Code: 13115" logLayer=csi_frontend requestID=70cd29c3-47ff-4ae9-920f-c72bb4c62d6b requestSource=CSI
time="2024-05-27T14:12:00Z" level=error msg="error publishing ontap-san-economy driver: error adding IQN iqn.1994-05.com.redhat:427913a2110 to igroup : error adding IQN iqn.1994-05.com.redhat:427913a2110 to igroup : API status: failed, Reason: Invalid value specified for \"initiator-group-name\" element within \"igroup-add\": \"\"., Code: 13115" Method=ControllerPublishVolume Type=CSI_Controller logLayer=csi_frontend requestID=427e7b23-27b2-4b11-8466-df80a88cac33 requestSource=CSI workflow="controller=publish"
time="2024-05-27T14:12:00Z" level=error msg="GRPC error: rpc error: code = Unknown desc = error publishing ontap-san-economy driver: error adding IQN iqn.1994-05.com.redhat:427913a2110 to igroup : error adding IQN iqn.1994-05.com.redhat:427913a2110 to igroup : API status: failed, Reason: Invalid value specified for \"initiator-group-name\" element within \"igroup-add\": \"\"., Code: 13115" logLayer=csi_frontend requestID=427e7b23-27b2-4b11-8466-df80a88cac33 requestSource=CSI
time="2024-05-27T14:12:01Z" level=error msg="error publishing ontap-san-economy driver: error adding IQN iqn.1994-05.com.redhat:565b28b44ce3 to igroup : error adding IQN iqn.1994-05.com.redhat:565b28b44ce3 to igroup : API status: failed, Reason: Invalid value specified for \"initiator-group-name\" element within \"igroup-add\": \"\"., Code: 13115" Method=ControllerPublishVolume Type=CSI_Controller logLayer=csi_frontend requestID=d7ce1f4c-98ce-46c7-a2b1-afbffa9cd3fe requestSource=CSI workflow="controller=publish"
time="2024-05-27T14:12:01Z" level=error msg="GRPC error: rpc error: code = Unknown desc = error publishing ontap-san-economy driver: error adding IQN iqn.1994-05.com.redhat:565b28b44ce3 to igroup : error adding IQN iqn.1994-05.com.redhat:565b28b44ce3 to igroup : API status: failed, Reason: Invalid value specified for \"initiator-group-name\" element within \"igroup-add\": \"\"., Code: 13115" logLayer=csi_frontend requestID=d7ce1f4c-98ce-46c7-a2b1-afbffa9cd3fe requestSource=CSI
time="2024-05-27T14:12:02Z" level=error msg="error publishing ontap-san-economy driver: error adding IQN iqn.1994-05.com.redhat:427913a2110 to igroup : error adding IQN iqn.1994-05.com.redhat:427913a2110 to igroup : API status: failed, Reason: Invalid value specified for \"initiator-group-name\" element within \"igroup-add\": \"\"., Code: 13115" Method=ControllerPublishVolume Type=CSI_Controller logLayer=csi_frontend requestID=432abe78-9e6c-4a42-9bcc-19df67b6b8ea requestSource=CSI workflow="controller=publish"
time="2024-05-27T14:12:02Z" level=error msg="GRPC error: rpc error: code = Unknown desc = error publishing ontap-san-economy driver: error adding IQN iqn.1994-05.com.redhat:427913a2110 to igroup : error adding IQN iqn.1994-05.com.redhat:427913a2110 to igroup : API status: failed, Reason: Invalid value specified for \"initiator-group-name\" element within \"igroup-add\": \"\"., Code: 13115" logLayer=csi_frontend requestID=432abe78-9e6c-4a42-9bcc-19df67b6b8ea requestSource=CSI
Node inf8 has two iscsi sessions and four iscsi devices:
Let's drain the node and reboot.
Did not solved...
$ oc describe pod virt-launcher-control-plane-3-qvmp4
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 6m8s default-scheduler Successfully assigned demo-cluster-disco/virt-launcher-control-plane-3-qvmp4 to inf8
Normal SuccessfulAttachVolume 6m8s attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-3cfd18ff-3d27-48d5-831b-7c0e33e6c6fb"
Warning FailedAttachVolume 113s (x10 over 6m7s) attachdetach-controller AttachVolume.Attach failed for volume "pvc-d3e86f80-77a3-4a4a-a5ed-e1cee486f959" : rpc error: code = Unknown desc = error publishing ontap-san-economy driver: error adding IQN iqn.1994-05.com.redhat:427913a2110 to igroup : error adding IQN iqn.1994-05.com.redhat:427913a2110 to igroup : API status: failed, Reason: Invalid value specified for "initiator-group-name" element within "igroup-add": ""., Code: 13115
$ oc get pv pvc-d3e86f80-77a3-4a4a-a5ed-e1cee486f959 -o yaml | grep internal
internalName: isar_pvc_d3e86f80_77a3_4a4a_a5ed_e1cee486f959
$ ssh admin@netapp-mgmt.coe.muc.redhat.com
...
fas2552::> lun show -lun isar_pvc_d3e86f80_77a3_4a4a_a5ed_e1cee486f959 -m
Vserver Path Igroup LUN ID Protocol
---------- ---------------------------------------- ------- ------ --------
svm_trident
/vol/trident_lun_pool_isar_ZTIFFWGFPK/isar_pvc_d3e86f80_77a3_4a4a_a5ed_e1cee486f959
trident-cd7b267e-7ff1-42ff-a2b5-617216ba06ea
13 iscsi
fas2552::> i group
ic if_addr_filter_info ifconfig
ifgrp ifstat igroup
ipsec iscsi
fas2552::> igroup show -vserver svm_trident -igroup trident-cd7b267e-7ff1-42ff-a2b5-617216ba06ea
Vserver Name: svm_trident
Igroup Name: trident-cd7b267e-7ff1-42ff-a2b5-617216ba06ea
Protocol: iscsi
OS Type: linux
Portset Binding Igroup: -
Igroup UUID: 52c58441-88a7-11ee-ba94-00a0987cd31a
ALUA: true
Initiators: iqn.1994-05.com.redhat:427913a2110 (logged in)
iqn.1994-05.com.redhat:43966bee5f92 (logged in)
iqn.1994-05.com.redhat:461a7e93bdc (logged in)
iqn.1994-05.com.redhat:565b28b44ce3 (logged in)
iqn.1994-05.com.redhat:70c628ecb0e8 (logged in)
iqn.1994-05.com.redhat:7ca623189291 (logged in)
iqn.1994-05.com.redhat:9d79fb87e6 (logged in)
iqn.1994-05.com.redhat:a3f4e45cf2ae (logged in)
iqn.1994-05.com.redhat:c253c1388f7f (not logged in)
Not all iSCSI LUNs are affected.
For example, for example pvc-9509623f-eaa0-449a-8b21-cd69205b25bb
, related to ushift08. Migration works fine:
$ oc get vm,vmi,pods -o wide | grep ushift08
virtualmachine.kubevirt.io/ushift08 4d10h Running True
virtualmachineinstance.kubevirt.io/ushift08 6h12m Running 10.32.99.8 inf7 True True
pod/virt-launcher-ushift08-88jmv 1/1 Running 0 3m34s 10.128.16.82 inf7 <none> 1/1
pod/virt-launcher-ushift08-d5qzz 0/1 Completed 0 5h3m 10.129.9.94 ucs56 <none> 1/1
$ oc get -o yaml pv pvc-9509623f-eaa0-449a-8b21-cd69205b25bb | grep internal
internalName: isar_pvc_9509623f_eaa0_449a_8b21_cd69205b25bb
fas2552::> lun show -lun isar_pvc_9509623f_eaa0_449a_8b21_cd69205b25bb -m
Vserver Path Igroup LUN ID Protocol
---------- ---------------------------------------- ------- ------ --------
svm_trident
/vol/trident_lun_pool_isar_ZTIFFWGFPK/isar_pvc_9509623f_eaa0_449a_8b21_cd69205b25bb
inf7-0f228f1d-6034-47c8-b456-0e13c65e964c
0 iscsi
fas2552::> igroup show -vserver svm_trident -igroup inf7-0f228f1d-6034-47c8-b456-0e13c65e964c
Vserver Name: svm_trident
Igroup Name: inf7-0f228f1d-6034-47c8-b456-0e13c65e964c
Protocol: iscsi
OS Type: linux
Portset Binding Igroup: -
Igroup UUID: 5464fa14-1c5e-11ef-b20d-00a0987cd31a
ALUA: true
Initiators: iqn.1994-05.com.redhat:565b28b44ce3 (logged in)
fas2552::>
There were some igroup changes:
v24.02.0: iSCSI self-healing will now initiate SCSI scans by exact LUN ID if deprecated igroups are in use (Issue #883).
v23.04.0: All ONTAP-SAN-* volumes will now use per-node igroups. LUNs will only be mapped to igroups while actively published to those nodes to improve our security posture. Existing volumes will be opportunistically switched to the new igroup scheme when Trident determines it is safe to do so without impacting active workloads (Issue #758).
How may PVC are mapped to the igroup trident-cd7b267e-7ff1-42ff-a2b5-617216ba06ea
:
$ ssh netapp lun mapping show | grep trident-cd7b267e-7ff1-42ff-a2b5-617216ba06ea
svm_trident /vol/trident_lun_pool_isar_JMKGHLWUOX/isar_pvc_63805303_c5fa_47a8_afc7_515f01e994c0 trident-cd7b267e-7ff1-42ff-a2b5-617216ba06ea 4 iscsi
svm_trident /vol/trident_lun_pool_isar_JMKGHLWUOX/isar_pvc_8c114dd6_26c8_4145_bddd_f0917943254b trident-cd7b267e-7ff1-42ff-a2b5-617216ba06ea 12 iscsi
svm_trident /vol/trident_lun_pool_isar_JMKGHLWUOX/isar_pvc_e9466bba_f5ce_44df_bd92_73cb149784ad trident-cd7b267e-7ff1-42ff-a2b5-617216ba06ea 9 iscsi
svm_trident /vol/trident_lun_pool_isar_ZTIFFWGFPK/isar_pvc_23a91c90_f935_4879_b3f7_ec9f9cb6ef75 trident-cd7b267e-7ff1-42ff-a2b5-617216ba06ea 10 iscsi
svm_trident /vol/trident_lun_pool_isar_ZTIFFWGFPK/isar_pvc_899cca95_ce47_4b84_a9a9_eb0b9965f068 trident-cd7b267e-7ff1-42ff-a2b5-617216ba06ea 5 iscsi
svm_trident /vol/trident_lun_pool_isar_ZTIFFWGFPK/isar_pvc_c7537b8a_efac_4d22_8161_f18b4a13287e trident-cd7b267e-7ff1-42ff-a2b5-617216ba06ea 11 iscsi
$ for pvc in $(ssh netapp lun mapping show | grep trident-cd7b267e-7ff1-42ff-a2b5-617216ba06ea | cut -d'/' -f 4 | cut -f1 -d' ' | sed 's/isar_//' | sed 's/_/-/g' ) ; do oc get pvc -A | grep $pvc; done
rhbk-operator pq-for-rhbk-2-wal Bound pvc-63805303-c5fa-47a8-afc7-515f01e994c0 3Gi RWO coe-netapp-san 107d
demo-cluster-disco worker-1-root Bound pvc-8c114dd6-26c8-4145-bddd-f0917943254b 120Gi RWX coe-netapp-san 38d
demo-cluster-disco control-plane-1-root Bound pvc-e9466bba-f5ce-44df-bd92-73cb149784ad 120Gi RWX coe-netapp-san 38d
demo-cluster-disco control-plane-2-root Bound pvc-23a91c90-f935-4879-b3f7-ec9f9cb6ef75 120Gi RWX coe-netapp-san 38d
rhbk-operator pq-for-rhbk-2 Bound pvc-899cca95-ce47-4b84-a9a9-eb0b9965f068 3Gi RWO coe-netapp-san 107d
demo-cluster-disco worker-2-root Bound pvc-c7537b8a-efac-4d22-8161-f18b4a13287e 120Gi RWX coe-netapp-san 38d
Let's check the Postgress Statefull set of our keycloak/rhbk:
$ oc get pods -n rhbk-operator -l cnpg.io/cluster=pq-for-rhbk
NAME READY STATUS RESTARTS AGE
pq-for-rhbk-1 1/1 Running 0 18h
pq-for-rhbk-2 0/1 CreateContainerError 0 38d
pq-for-rhbk-3 1/1 Running 0 18h
=> The trident upgrade was 3 weeks ago, the pods are not that old.
$ oc get pvc -n rhbk-operator
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
pq-for-rhbk-1 Bound pvc-aed4c39f-ceb3-4a80-861e-6ba667284eca 3Gi RWO coe-netapp-san 107d
pq-for-rhbk-1-wal Bound pvc-4fd41748-ccd5-42ad-9f2d-11c3eb7c5401 3Gi RWO coe-netapp-san 107d
pq-for-rhbk-2 Bound pvc-899cca95-ce47-4b84-a9a9-eb0b9965f068 3Gi RWO coe-netapp-san 107d
pq-for-rhbk-2-wal Bound pvc-63805303-c5fa-47a8-afc7-515f01e994c0 3Gi RWO coe-netapp-san 107d
pq-for-rhbk-3 Bound pvc-b97e4c95-f56b-49f1-94f3-bc95827bdee0 3Gi RWO coe-netapp-san 107d
pq-for-rhbk-3-wal Bound pvc-3b538c3c-7799-4ad8-9078-53b464a42a76 3Gi RWO coe-netapp-san 107d
rhbk PVC -> igroup mapping:
$ for lun in $(oc get pvc -o custom-columns="PV:.spec.volumeName" --no-headers | xargs oc get pv -o yaml | grep internal | cut -f2 -d':' | tr -d ' ' ) ; do echo "Check lun: $lun"; ssh netapp lun show -lun $lun -m ; done;
Check lun: isar_pvc_aed4c39f_ceb3_4a80_861e_6ba667284eca
Last login time: 5/28/2024 09:42:48
Vserver Path Igroup LUN ID Protocol
---------- ---------------------------------------- ------- ------ --------
svm_trident /vol/trident_lun_pool_isar_JMKGHLWUOX/isar_pvc_aed4c39f_ceb3_4a80_861e_6ba667284eca inf4-0f228f1d-6034-47c8-b456-0e13c65e964c 0 iscsi
Check lun: isar_pvc_4fd41748_ccd5_42ad_9f2d_11c3eb7c5401
Last login time: 5/28/2024 09:43:13
Vserver Path Igroup LUN ID Protocol
---------- ---------------------------------------- ------- ------ --------
svm_trident /vol/trident_lun_pool_isar_ZTIFFWGFPK/isar_pvc_4fd41748_ccd5_42ad_9f2d_11c3eb7c5401 inf4-0f228f1d-6034-47c8-b456-0e13c65e964c 1 iscsi
Check lun: isar_pvc_899cca95_ce47_4b84_a9a9_eb0b9965f068
Last login time: 5/28/2024 09:43:14
Vserver Path Igroup LUN ID Protocol
---------- ---------------------------------------- ------- ------ --------
svm_trident /vol/trident_lun_pool_isar_ZTIFFWGFPK/isar_pvc_899cca95_ce47_4b84_a9a9_eb0b9965f068 trident-cd7b267e-7ff1-42ff-a2b5-617216ba06ea 5 iscsi
Check lun: isar_pvc_63805303_c5fa_47a8_afc7_515f01e994c0
Last login time: 5/28/2024 09:43:14
Vserver Path Igroup LUN ID Protocol
---------- ---------------------------------------- ------- ------ --------
svm_trident /vol/trident_lun_pool_isar_JMKGHLWUOX/isar_pvc_63805303_c5fa_47a8_afc7_515f01e994c0 trident-cd7b267e-7ff1-42ff-a2b5-617216ba06ea 4 iscsi
Check lun: isar_pvc_b97e4c95_f56b_49f1_94f3_bc95827bdee0
Last login time: 5/28/2024 09:43:15
Vserver Path Igroup LUN ID Protocol
---------- ---------------------------------------- ------- ------ --------
svm_trident /vol/trident_lun_pool_isar_ZTIFFWGFPK/isar_pvc_b97e4c95_f56b_49f1_94f3_bc95827bdee0 inf6-0f228f1d-6034-47c8-b456-0e13c65e964c 0 iscsi
Check lun: isar_pvc_3b538c3c_7799_4ad8_9078_53b464a42a76
Last login time: 5/28/2024 09:43:16
Vserver Path Igroup LUN ID Protocol
---------- ---------------------------------------- ------- ------ --------
svm_trident /vol/trident_lun_pool_isar_ZTIFFWGFPK/isar_pvc_3b538c3c_7799_4ad8_9078_53b464a42a76 inf6-0f228f1d-6034-47c8-b456-0e13c65e964c 1 iscsi
Postgres recovered, thanks to the PQ Operator. Other workload terminated.
Wait for final cleanup:
$ for pvc in $(ssh netapp lun mapping show | grep trident-cd7b267e-7ff1-42ff-a2b5-617216ba06ea | cut -d'/' -f 4 | cut -f1 -d' ' | sed 's/isar_//' | sed 's/_/-/g' ) ; do oc get pvc -A | grep $pvc; done
# No PV/PVC anymore.
$ ssh netapp lun mapping show | grep trident-cd7b267e-7ff1-42ff-a2b5-617216ba06ea
svm_trident /vol/trident_lun_pool_isar_JMKGHLWUOX/isar_pvc_63805303_c5fa_47a8_afc7_515f01e994c0 trident-cd7b267e-7ff1-42ff-a2b5-617216ba06ea 4 iscsi
svm_trident /vol/trident_lun_pool_isar_JMKGHLWUOX/isar_pvc_e9466bba_f5ce_44df_bd92_73cb149784ad trident-cd7b267e-7ff1-42ff-a2b5-617216ba06ea 9 iscsi
svm_trident /vol/trident_lun_pool_isar_ZTIFFWGFPK/isar_pvc_899cca95_ce47_4b84_a9a9_eb0b9965f068 trident-cd7b267e-7ff1-42ff-a2b5-617216ba06ea 5 iscsi
svm_trident /vol/trident_lun_pool_isar_ZTIFFWGFPK/isar_pvc_c7537b8a_efac_4d22_8161_f18b4a13287e trident-cd7b267e-7ff1-42ff-a2b5-617216ba06ea 11 **iscsi**
Solved for a long time. We rebuild the postgress cluster. one copy was already on a new share.
After #172 we pods are stuck in Container Creating: