xcp-ng / xcp

Entry point for issues and wiki. Also contains some scripts and sources.
https://xcp-ng.org
1.26k stars 74 forks source link

iSCSI connect after reboot fails (Unsupported SCSI Opcode) #128

Open NiTRoeSE opened 5 years ago

NiTRoeSE commented 5 years ago

Hi, i have a problem and don't find a reason for it nor a solution, I need help. I try to make it short as possible.

Environment:

Problem:

I have 3 luns connected to xenservers, each lun over a separated target. All runs fine till the point I reboot one of the xenservers. After a reboot the rebooted XenServer are not able to connect again to all iSCSI-Targets accept to the target wich provides the 1 lun. If I detach and forget the iSCSI-SR and connect it again all runs fine again till I reboot a xenserver, but I loose all the data inside this SR, so that's not a solution.

What I have tried to find a solution:

error message Xen-Orchestra:

SR_BACKEND_FAILURE_47(, The SR is not available [opterr=Error reporting error, unknown key Device not appeared yet], )

error message storage-server

 kernel: [11219.445255] rx_data returned 0, expecting 48.
 kernel: [11219.446656] iSCSI Login negotiation failed.
 kernel: [11219.522700] rx_data returned 0, expecting 48.
 kernel: [11219.524082] iSCSI Login negotiation failed.
 kernel: [11219.642772] iSCSI/iqn.2018-12.li.pls.dcx3:dcx3init: Unsupported SCSI Opcode 0xa3, sending CHECK_CONDITION.
 kernel: [11219.644031] rx_data returned 0, expecting 48.
 kernel: [11219.645430] iSCSI Login negotiation failed.
 kernel: [11219.783404] iSCSI/iqn.2018-12.li.pls.dcx3:dcx3init: Unsupported SCSI Opcode 0xa3, sending CHECK_CONDITION.
 kernel: [11219.784879] rx_data returned 0, expecting 48.
 kernel: [11219.786736] iSCSI Login negotiation failed.
 kernel: [11219.921800] iSCSI/iqn.2018-12.li.pls.dcx3:dcx3init: Unsupported SCSI Opcode 0xa3, sending CHECK_CONDITION.

Xen-Center NG

bildschirmfoto 2019-01-28 um 13 58 08 bildschirmfoto 2019-01-28 um 13 59 20

Active Storage-Server iSCSI Configuration: (Targetcli)

bildschirmfoto 2019-01-28 um 13 47 55

Storage-Cluster - Pacemaker Configuration...

primitive p_drbd_r1_LUN-1 ocf:linbit:drbd \
        params drbd_resource=r1_LUN-1 \
        op start timeout=240 interval=0 \
        op promote timeout=90 interval=0 \
        op demote timeout=90 interval=0 \
        op stop timeout=100 interval=0 \
        op monitor interval=20 role=Master \
        op monitor interval=23 role=Slave
primitive p_drbd_r2_LUN-2 ocf:linbit:drbd \
        params drbd_resource=r2_LUN-2 \
        op start timeout=240 interval=0 \
        op promote timeout=90 interval=0 \
        op demote timeout=90 interval=0 \
        op stop timeout=100 interval=0 \
        op monitor interval=20 role=Master \
        op monitor interval=23 role=Slave
primitive p_drbd_r3_LUN-3 ocf:linbit:drbd \
        params drbd_resource=r3_LUN-3 \
        op start timeout=240 interval=0 \
        op promote timeout=90 interval=0 \
        op demote timeout=90 interval=0 \
        op stop timeout=100 interval=0 \
        op monitor interval=20 role=Master \
        op monitor interval=23 role=Slave
primitive p_drbd_r4_LUN-4 ocf:linbit:drbd \
        params drbd_resource=r4_LUN-4 \
        op start timeout=240 interval=0 \
        op promote timeout=90 interval=0 \
        op demote timeout=90 interval=0 \
        op stop timeout=100 interval=0 \
        op monitor interval=20 role=Master \
        op monitor interval=23 role=Slave
primitive p_drbd_r5_LUN-5 ocf:linbit:drbd \
        params drbd_resource=r5_LUN-5 \
        op start timeout=240 interval=0 \
        op promote timeout=90 interval=0 \
        op demote timeout=90 interval=0 \
        op stop timeout=100 interval=0 \
        op monitor interval=20 role=Master \
        op monitor interval=23 role=Slave
primitive p_drbd_r6_LUN-6 ocf:linbit:drbd \
        params drbd_resource=r6_LUN-6 \
        op start timeout=240 interval=0 \
        op promote timeout=90 interval=0 \
        op demote timeout=90 interval=0 \
        op stop timeout=100 interval=0 \
        op monitor interval=20 role=Master \
        op monitor interval=23 role=Slave \
primitive p_fence_dcxs1 stonith:fence_ipmilan \
        params pcmk_host_list=DCXS1 ipaddr=10.1.148.190 action=off login=ADMIN passwd=Poe8mOut delay=15 \
        op monitor interval=60s
primitive p_fence_dcxs2 stonith:fence_ipmilan \
        params pcmk_host_list=DCXS2 ipaddr=10.1.148.189 action=off login=ADMIN passwd=Poe8mOut delay=15 \
        op monitor interval=60s
primitive p_iscsi_lun_drbd0_r1_LUN-1 iSCSILogicalUnit \
        params target_iqn="iqn.2019-01.li.pls:dcxsdrbd" implementation=lio-t lun=0 path="/dev/drbd0" \
        op start timeout=20 interval=0 \
        op stop timeout=20 interval=0 \
        op monitor interval=20 timout=40
primitive p_iscsi_lun_drbd1_r2_LUN-2 iSCSILogicalUnit \
        params target_iqn="iqn.2019-01.li.pls:dcxsdrbd1" implementation=lio-t lun=1 path="/dev/drbd1" \
        op start timeout=20 interval=0 \
        op stop timeout=20 interval=0 \
        op monitor interval=20 timout=40
primitive p_iscsi_lun_drbd2_r3_LUN-3 iSCSILogicalUnit \
        params target_iqn="iqn.2019-01.li.pls:dcxsdrbd2" implementation=lio-t lun=2 path="/dev/drbd2" \
        op start timeout=20 interval=0 \
        op stop timeout=20 interval=0 \
        op monitor interval=20 timout=40
primitive p_iscsi_lun_drbd3_r4_LUN-4 iSCSILogicalUnit \
        params target_iqn="iqn.2019-01.li.pls:dcxsdrbd3" implementation=lio-t lun=3 path="/dev/drbd3" \
        op start timeout=20 interval=0 \
        op stop timeout=20 interval=0 \
        op monitor interval=20 timout=40
primitive p_iscsi_lun_drbd4_r5_LUN-5 iSCSILogicalUnit \
        params target_iqn="iqn.2019-01.li.pls:dcxsdrbd4" implementation=lio-t lun=4 path="/dev/drbd4" \
        op start timeout=20 interval=0 \
        op stop timeout=20 interval=0 \
        op monitor interval=20 timout=40
primitive p_iscsi_lun_drbd5_r6_LUN-6 iSCSILogicalUnit \
        params target_iqn="iqn.2019-01.li.pls:dcxsdrbd5" implementation=lio-t lun=5 path="/dev/drbd5" \
        op start timeout=20 interval=0 \
        op stop timeout=20 interval=0 \
        op monitor interval=20 timout=40
primitive p_iscsi_multipath_ip0 IPaddr2 \
        params ip=172.18.1.10 cidr_netmask=24 nic=enp101s0f0 \
        op start timeout=20 interval=0 \
        op stop timeout=20 interval=0 \
        op monitor interval=10s
primitive p_iscsi_multipath_ip1 IPaddr2 \
        params ip=172.18.2.10 cidr_netmask=24 nic=enp101s0f1 \
        op start timeout=20 interval=0 \
        op stop timeout=20 interval=0 \
        op monitor interval=10s \
primitive p_iscsi_portblock_ip0_off portblock \
        params ip=172.18.1.10 portno="3260,3261,3262,3263,3264,3265" protocol=tcp action=unblock \
        op start timeout=20 interval=0 \
        op stop timeout=20 interval=0 \
        op monitor timeout=20 interval=10
primitive p_iscsi_portblock_ip0_on portblock \
        params ip=172.18.1.10 portno="3260,3261,3262,3263,3264,3265" protocol=tcp action=block \
        op start timeout=20 interval=0 \
        op stop timeout=20 interval=0 \
        op monitor timeout=20 interval=10
primitive p_iscsi_portblock_ip1_off portblock \
        params ip=172.18.2.10 portno="3260,3261,3262,3263,3264,3265" protocol=tcp action=unblock \
        op start timeout=20 interval=0 \
        op stop timeout=20 interval=0 \
        op monitor timeout=20 interval=10
primitive p_iscsi_portblock_ip1_on portblock \
        params ip=172.18.2.10 portno="3260,3261,3262,3263,3264,3265" protocol=tcp action=block \
        op start timeout=20 interval=0 \
        op stop timeout=20 interval=0 \
        op monitor timeout=20 interval=10
primitive p_iscsi_target_drbd iSCSITarget \
        params iqn="iqn.2019-01.li.pls:dcxsdrbd" implementation=lio-t portals="172.18.1.10:3260 172.18.2.10:3260" \
        op start timeout=20 interval=0 \
        op stop timeout=20 interval=0 \
        op monitor interval=20 timeout=40
primitive p_iscsi_target_drbd1 iSCSITarget \
        params iqn="iqn.2019-01.li.pls:dcxsdrbd1" implementation=lio-t portals="172.18.1.10:3261 172.18.2.10:3261" \
        op start timeout=20 interval=0 \
        op stop timeout=20 interval=0 \
        op monitor interval=20 timeout=40
primitive p_iscsi_target_drbd2 iSCSITarget \
        params iqn="iqn.2019-01.li.pls:dcxsdrbd2" implementation=lio-t portals="172.18.1.10:3262 172.18.2.10:3262" \
        op start timeout=20 interval=0 \
        op stop timeout=20 interval=0 \
        op monitor interval=20 timeout=40
primitive p_iscsi_target_drbd3 iSCSITarget \
        params iqn="iqn.2019-01.li.pls:dcxsdrbd3" implementation=lio-t portals="172.18.1.10:3263 172.18.2.10:3263" \
        op start timeout=20 interval=0 \
        op stop timeout=20 interval=0 \
        op monitor interval=20 timeout=40
primitive p_iscsi_target_drbd4 iSCSITarget \
        params iqn="iqn.2019-01.li.pls:dcxsdrbd4" implementation=lio-t portals="172.18.1.10:3264 172.18.2.10:3264" \
        op start timeout=20 interval=0 \
        op stop timeout=20 interval=0 \
        op monitor interval=20 timeout=40
primitive p_iscsi_target_drbd5 iSCSITarget \
        params iqn="iqn.2019-01.li.pls:dcxsdrbd5" implementation=lio-t portals="172.18.1.10:3265 172.18.2.10:3265" \
        op start timeout=20 interval=0 \
        op stop timeout=20 interval=0 \
        op monitor interval=20 timeout=40 \
group g_iscsi_drbd p_iscsi_portblock_ip0_on p_iscsi_portblock_ip1_on p_iscsi_multipath_ip0 p_iscsi_multipath_ip1 p_iscsi_target_drbd p_iscsi_target_drbd1 p_iscsi_target_drbd2 p_iscsi_target_drbd3 p_iscsi_target_drbd4 p_iscsi_target_drbd5 p_iscsi_lun_drbd0_r1_LUN-1 p_iscsi_lun_drbd1_r2_LUN-2 p_iscsi_lun_drbd2_r3_LUN-3 p_iscsi_lun_drbd3_r4_LUN-4 p_iscsi_lun_drbd4_r5_LUN-5 p_iscsi_lun_drbd5_r6_LUN-6 p_iscsi_portblock_ip0_off p_iscsi_portblock_ip1_off
ms ms_drbd_r1_LUN-1 p_drbd_r1_LUN-1 \
        meta master-max=1 master-node-max=1 notify=true clone-max=2 clone-node-max=1
ms ms_drbd_r2_LUN-2 p_drbd_r2_LUN-2 \
        meta master-max=1 master-node-max=1 notify=true clone-max=2 clone-node-max=1
ms ms_drbd_r3_LUN-3 p_drbd_r3_LUN-3 \
        meta master-max=1 master-node-max=1 notify=true clone-max=2 clone-node-max=1
ms ms_drbd_r4_LUN-4 p_drbd_r4_LUN-4 \
        meta master-max=1 master-node-max=1 notify=true clone-max=2 clone-node-max=1
ms ms_drbd_r5_LUN-5 p_drbd_r5_LUN-5 \
        meta master-max=1 master-node-max=1 notify=true clone-max=2 clone-node-max=1
ms ms_drbd_r6_LUN-6 p_drbd_r6_LUN-6 \
        meta master-max=1 master-node-max=1 notify=true clone-max=2 clone-node-max=1
colocation cl_iscsi_drbd_with_ms_drbd_r1_LUN-1 inf: g_iscsi_drbd:Started ms_drbd_r1_LUN-1
colocation cl_iscsi_drbd_with_ms_drbd_r2_LUN-2 inf: g_iscsi_drbd:Started ms_drbd_r2_LUN-2
colocation cl_iscsi_drbd_with_ms_drbd_r3_LUN-3 inf: g_iscsi_drbd:Started ms_drbd_r3_LUN-3
colocation cl_iscsi_drbd_with_ms_drbd_r4_LUN-4 inf: g_iscsi_drbd:Started ms_drbd_r4_LUN-4
colocation cl_iscsi_drbd_with_ms_drbd_r5_LUN-5 inf: g_iscsi_drbd:Started ms_drbd_r5_LUN-5
colocation cl_iscsi_drbd_with_ms_drbd_r6_LUN-6 inf: g_iscsi_drbd:Started ms_drbd_r6_LUN-6
location l_fence_dcxs1 p_fence_dcxs1 -inf: DCXS1
location l_fence_dcxs2 p_fence_dcxs2 -inf: DCXS2
order o_ms_drbd_luns_BEFORE_g_iscsi_drbd inf: ms_drbd_r6_LUN-6:promote g_iscsi_drbd:start
order o_ms_drbd_r1_LUN-1_BEFORE_ms_drbd_r2_LUN-2 inf: ms_drbd_r1_LUN-1:promote ms_drbd_r2_LUN-2:start
order o_ms_drbd_r2_LUN-2_BEFORE_ms_drbd_r3_LUN-3 inf: ms_drbd_r2_LUN-2:promote ms_drbd_r3_LUN-3:start
order o_ms_drbd_r3_LUN-3_BEFORE_ms_drbd_r4_LUN-4 inf: ms_drbd_r3_LUN-3:promote ms_drbd_r4_LUN-4:start
order o_ms_drbd_r4_LUN-4_BEFORE_ms_drbd_r5_LUN-5 inf: ms_drbd_r4_LUN-4:promote ms_drbd_r5_LUN-5:start
order o_ms_drbd_r5_LUN-5_BEFORE_ms_drbd_r6_LUN-6 inf: ms_drbd_r5_LUN-5:promote ms_drbd_r6_LUN-6:start

I don't know if its a bug on XenServer or something else. I hope someone can help, or give a hint for a solution or the problem. For me it seems like that xenserver fails to login to more than 1 target or lun, but I have no restricts on targets nor set any authentication.

Thanks in advanced!

NiTRoeSE commented 5 years ago

Finally it seems that its not a bug with xen, but maybe a improvement is possible to help people in future ?

I posted a Solution --> https://xcp-ng.org/forum/topic/891/iscsi-connect-after-reboot-fails-permanently-unsupported-scsi-opcode/3

sammcj commented 5 years ago

Interesting - I experienced this also, I tried a bunch of things and ended up reinstalling xcp-ng and that fixed it, I suspected it was a problem on XCP-ng SR->devicemapper or devicemapper->iSCSI initator side.

olivierlambert commented 5 years ago

@sammcj the solution is simpler:

The real problem was that in a storage-cluster environment everytime the node changes or pacemaker restarts the resources etc. the iSCSI SN from the lun are new generated and differs from that before, but XEN needs a persistent identifier. So the solution to fix this was to find a way to have identical SN for each LUN on each cluster node.