xcp-ng / xcp

Entry point for issues and wiki. Also contains some scripts and sources.
https://xcp-ng.org
1.26k stars 74 forks source link

HBA SR not available when Multipathing enabled in XCP 8.0 #284

Open kentnz opened 5 years ago

kentnz commented 5 years ago

Issue: When Multipathing is enabled in XCP 8.0 for a SR connected via 2 channel Fibre to a HP 3PAR SAN storage array, then XCP fails to connect and reports an error 'Invalid Option'

Steps to Reproduce: 1) Install XCP-NG 8.0. 2) Post installation, run 'yum update' to get up to date. 3) In XenCenter i. Add HBA Storage Repository (I'm connected to an HP 3PAR 8 via Fibre, 2 channels) ii. Ensure the Storage Repository is listed and available, etc. 3) Enter Maintenance Mode, Turn on 'Multipathing' and save. RESULT: XCP fails to reconnect to the Storage Repository (SR) and reports an error 'Invalid Option'. Selecting 'Repair' on the SR reports the same error. 4) Enter maintenance Mode again, turn 'OFF' Multipathing and save. RESULT: After selecting 'Repair', the SR is reconnected again.

Repeating steps 3 & 4 produces the same outcome - whenever Multipathing is enabled, XCP fails to connect to our 3PAR SR. Note: No issues connecting to the local SR or the NFS drive I also have mounted (these don't support Multipathing).

Expected Result: Post enabling Multipathing, the SR should reconnect.

Actual Result: The SR fails to reconnect and reports an error: 'Invalid Option'

[15112] ***** generic exception: sr_attach: EXCEPTION <class 'SR.SROSError'>, Error reporting error, unknown key DMP failed to activate mapper path

[15112] raise xs_errors.XenError('DMP failed to activate mapper path') [15112] ***** LVHD over FC: EXCEPTION <class 'SR.SROSError'>, Error reporting error, unknown key DMP failed to activate mapper path

More Information: This is the output from the /var/logs/SMlog when enabling Multipathing.

[15112] Setting LVM_DEVICE to /dev/disk/by-scsid/360002ac000000000000000020001d3ea
[15112] Setting LVM_DEVICE to /dev/disk/by-scsid/360002ac000000000000000020001d3ea
[15112] lock: opening lock file /var/lock/sm/9c1eeb54-af8e-8538-c90f-d6ec12404f21/sr
[15112] LVMCache created for VG_XenStorage-9c1eeb54-af8e-8538-c90f-d6ec12404f21
[15112] ['/sbin/vgs', '--readonly', 'VG_XenStorage-9c1eeb54-af8e-8538-c90f-d6ec12404f21']
[15112] pread SUCCESS
[15112] lock: acquired /var/lock/sm/9c1eeb54-af8e-8538-c90f-d6ec12404f21/sr
[15112] LVMCache: will initialize now
[15112] LVMCache: refreshing
[15112] ['/sbin/lvs', '--noheadings', '--units', 'b', '-o', '+lv_tags', '/dev/VG_XenStorage-9c1eeb54-af8e-8538-c90f-d6ec12404f21']
[15112] pread SUCCESS
[15112] lock: released /var/lock/sm/9c1eeb54-af8e-8538-c90f-d6ec12404f21/sr
[15112] lock: acquired /var/lock/sm/9c1eeb54-af8e-8538-c90f-d6ec12404f21/sr
[15112] sr_attach {'sr_uuid': '9c1eeb54-af8e-8538-c90f-d6ec12404f21', 'subtask_of': 'DummyRef:|7fd8329d-7974-4bf1-b5fb-3037d14a1c8a|SR.attach', 'args': [], 'host_ref': 'OpaqueRef:1ddd077a-19fa-4112-824f-b661e81eb024', 'session_ref': 'OpaqueRef:428df3a5-59c6-432c-b0c9-18f93c6e51e7', 'device_config': {'device': '/dev/disk/mpInuse/360002ac000000000000000020001d3ea', 'SCSIid': '360002ac000000000000000020001d3ea', 'SRmaster': 'true'}, 'command': 'sr_attach', 'sr_ref': 'OpaqueRef:75ecd97c-c3cb-4630-9c36-5e3b8d4684ae'}
[15112] === SR 9c1eeb54-af8e-8538-c90f-d6ec12404f21: abort ===
[15112] lock: opening lock file /var/lock/sm/9c1eeb54-af8e-8538-c90f-d6ec12404f21/running
[15112] lock: opening lock file /var/lock/sm/9c1eeb54-af8e-8538-c90f-d6ec12404f21/gc_active
[15112] lock: tried lock /var/lock/sm/9c1eeb54-af8e-8538-c90f-d6ec12404f21/running, acquired: True (exists: True)
[15112] abort: releasing the process lock
[15112] lock: released /var/lock/sm/9c1eeb54-af8e-8538-c90f-d6ec12404f21/running
[15112] lock: opening lock file /var/lock/sm/9c1eeb54-af8e-8538-c90f-d6ec12404f21/running
[15112] lock: opening lock file /var/lock/sm/9c1eeb54-af8e-8538-c90f-d6ec12404f21/sr
[15112] lock: acquired /var/lock/sm/9c1eeb54-af8e-8538-c90f-d6ec12404f21/running
[15112] lock: acquired /var/lock/sm/9c1eeb54-af8e-8538-c90f-d6ec12404f21/sr
[15112] RESET for SR 9c1eeb54-af8e-8538-c90f-d6ec12404f21 (master: True)
[15112] lock: released /var/lock/sm/9c1eeb54-af8e-8538-c90f-d6ec12404f21/sr
[15112] lock: released /var/lock/sm/9c1eeb54-af8e-8538-c90f-d6ec12404f21/running
[15112] lock: closed /var/lock/sm/9c1eeb54-af8e-8538-c90f-d6ec12404f21/running
[15112] lock: closed /var/lock/sm/9c1eeb54-af8e-8538-c90f-d6ec12404f21/sr
[15112] set_dirty 'OpaqueRef:75ecd97c-c3cb-4630-9c36-5e3b8d4684ae' failed (flag already set?)
[15112] MPATH: multipath activate called
[15112] mpath cmd: help
[15112] mpath output: multipathd> help
[15112] multipath-tools v0.4.9 (05/33, 2016)
[15112] CLI commands reference:
[15112] list|show paths
[15112] list|show paths format $format
[15112] list|show paths raw format $format
[15112] list|show status
[15112] list|show daemon
[15112] list|show maps|multipaths
[15112] list|show maps|multipaths status
[15112] list|show maps|multipaths stats
[15112] list|show maps|multipaths format $format
[15112] list|show maps|multipaths raw format $format
[15112] list|show maps|multipaths topology
[15112] list|show maps|multipaths json
[15112] list|show topology
[15112] list|show map|multipath $map topology
[15112] list|show map|multipath $map json
[15112] list|show config
[15112] list|show blacklist
[15112] list|show devices
[15112] list|show wildcards
[15112] reset maps|multipaths stats
[15112] reset map|multipath $map stats
[15112] add path $path
[15112] remove|del path $path
[15112] add map|multipath $map
[15112] remove|del map|multipath $map
[15112] switch|switchgroup map|multipath $map group $group
[15112] reconfigure
[15112] suspend map|multipath $map
[15112] resume map|multipath $map
[15112] resize map|multipath $map
[15112] reset map|multipath $map
[15112] reload map|multipath $map
[15112] disablequeueing map|multipath $map
[15112] restorequeueing map|multipath $map
[15112] disablequeueing maps|multipaths
[15112] restorequeueing maps|multipaths
[15112] reinstate path $path
[15112] fail path $path
[15112] quit|exit
[15112] shutdown
[15112] map|multipath $map getprstatus
[15112] map|multipath $map setprstatus
[15112] map|multipath $map unsetprstatus
[15112] map|multipath $map getprkey
[15112] map|multipath $map setprkey key $key
[15112] map|multipath $map unsetprkey
[15112] forcequeueing daemon
[15112] restorequeueing daemon
[15112] multipathd>
[15112] MPATH: dm-multipath activated.
[15112] Refreshing LUN 360002ac000000000000000020001d3ea
[15112] ['/usr/sbin/multipath', '-r', '360002ac000000000000000020001d3ea']
[15112] pread SUCCESS
[15112] lock: released /var/lock/sm/9c1eeb54-af8e-8538-c90f-d6ec12404f21/sr
[15112] ***** generic exception: sr_attach: EXCEPTION <class 'SR.SROSError'>, Error reporting error, unknown key DMP failed to activate mapper path
[15112] File "/opt/xensource/sm/SRCommand.py", line 110, in run
[15112] return self._run_locked(sr)
[15112] File "/opt/xensource/sm/SRCommand.py", line 159, in _run_locked
[15112] rv = self._run(sr, target)
[15112] File "/opt/xensource/sm/SRCommand.py", line 349, in _run
[15112] return sr.attach(self.params['sr_uuid'])
[15112] File "/opt/xensource/sm/LVMoHBASR", line 122, in attach
[15112] self.mpathmodule.refresh(self.SCSIid,0)
[15112] File "/opt/xensource/sm/mpath_dmp.py", line 177, in refresh
[15112] _refresh_DMP(sid,npaths)
[15112] File "/opt/xensource/sm/mpath_dmp.py", line 242, in _refresh_DMP
[15112] raise xs_errors.XenError('DMP failed to activate mapper path')
[15112]
[15112] ***** LVHD over FC: EXCEPTION <class 'SR.SROSError'>, Error reporting error, unknown key DMP failed to activate mapper path
[15112] File "/opt/xensource/sm/SRCommand.py", line 372, in run
[15112] ret = cmd.run(sr)
[15112] File "/opt/xensource/sm/SRCommand.py", line 110, in run
[15112] return self._run_locked(sr)
[15112] File "/opt/xensource/sm/SRCommand.py", line 159, in _run_locked
[15112] rv = self._run(sr, target)
[15112] File "/opt/xensource/sm/SRCommand.py", line 349, in _run
[15112] return sr.attach(self.params['sr_uuid'])
[15112] File "/opt/xensource/sm/LVMoHBASR", line 122, in attach
[15112] self.mpathmodule.refresh(self.SCSIid,0)
[15112] File "/opt/xensource/sm/mpath_dmp.py", line 177, in refresh
[15112] _refresh_DMP(sid,npaths)
[15112] File "/opt/xensource/sm/mpath_dmp.py", line 242, in _refresh_DMP
[15112] raise xs_errors.XenError('DMP failed to activate mapper path')
[15112]
[15112] lock: closed /var/lock/sm/9c1eeb54-af8e-8538-c90f-d6ec12404f21/sr
[15112] lock: closed /var/lock/sm/9c1eeb54-af8e-8538-c90f-d6ec12404f21/gc_active
[15112] lock: closed /var/lock/sm/9c1eeb54-af8e-8538-c90f-d6ec12404f21/running
[15264] sr_update {'sr_uuid': '5082faa8-9df4-1184-8add-012bd023829a', 'subtask_of': 'DummyRef:|bba377ce-62f2-46f3-bedd-af64858128bc|SR.stat', 'args': [], 'host_ref': 'OpaqueRef:1ddd077a-19fa-4112-824f-b661e81eb024', 'session_ref': 'OpaqueRef:e4a45f24-a395-4c29-9328-59d05d4f6eee', 'device_config': {'nfsversion': '4', 'type': 'nfs_iso', 'SRmaster': 'true', 'location': 'seagate:/shares/XenServer/'}, 'command': 'sr_update', 'sr_ref': 'OpaqueRef:a7aacc3b-8e4d-49e0-a65f-81217011917b'}
[15444] sr_update {'sr_uuid': '5082faa8-9df4-1184-8add-012bd023829a', 'subtask_of': 'DummyRef:|8b0266ac-53a0-405b-8331-1921ca5d30f7|SR.stat', 'args': [], 'host_ref': 'OpaqueRef:1ddd077a-19fa-4112-824f-b661e81eb024', 'session_ref': 'OpaqueRef:027e4a72-f9fc-4e5c-9e10-96753733ea45', 'device_config': {'nfsversion': '4', 'type': 'nfs_iso', 'SRmaster': 'true', 'location': 'seagate:/shares/XenServer/'}, 'command': 'sr_update', 'sr_ref': 'OpaqueRef:a7aacc3b-8e4d-49e0-a65f-81217011917b'}
nagilum99 commented 5 years ago

As a proper multipathing requieres configuration info, it's probably mandatory to put the right settings into: /etc/multipath.conf. I had to add proper device infos for our MSA2040 SAN/FC for XenServer 7.1 - I doubt 3PAR had found it's way into it, since then.

I contacted some guys from HP(E) and after a while I got proper infos back and gave feedback to Citrix for upstream purposes.

kentnz commented 5 years ago

My setup was working fine with prior versions of XenServer (up to 7.2) and XCP (7.6). It's only post updating to 8.0 that it's failing. Something has changed with the 8.0 update that is breaking enabling Multipathing.

So, either something has been removed - or something in the process of attaching when Multipathing is enabled.

Why does the log show the help for multipathd ?

[15112] MPATH: multipath activate called [15112] mpath cmd: help [15112] mpath output: multipathd> help

Regarding the /etc/multipath.conf file, the only difference I can see is in the 'blacklist'.

XenServer 7.2

blacklist { devnode "^nvme.*" }

XCP 8.0

blacklist { devnode "^nvme." devnode "scini" devnode "^rbd[0-9]" devnode "^nbd[0-9]" }

I've run: multipathd -k"show conf" I can see a whole group of references to "HP" under 'blacklist_exceptions'

The other thing are there are some new 'defaults' in the list (comparing 7.2 versus 8.0, I don't have a 7.6 server any more).

prkeys_file /etc/multipath/prkeys detect_path_checker no skip_kpartx no remove_retries 0 disable_changed_wwids no unpriv_sgio no ghost_delay no

thanks Kent.

nagilum99 commented 5 years ago

I guess it's your job to test other/old settings in config file and/or contact HP. 3PAR is a rather expensive thing and probably noone here has one to test. I know they change the defaults from time to time, but I can't help you with that - as said: I reached an HP support guy and they gave me their settings from the lab. Also one HP guy was watching the XenServer bugtracker and contributing.

kentnz commented 5 years ago

Okay - where is the XenServer bugtracker so I can post this issue there and see ?

To me, this appears to be a regression in XenServer/XCP 8.0 with Multipathing enabled - as it was working fine in 7.6 - plus the error in XCP Admin Console is reporting 'Invalid Parameter' - which seems to tie in with the log file dumping the help for 'mpath output: multipathd> help'

Thanks Kent.

kentnz commented 5 years ago

Found the XenServer bug tracker (hopefully) and have submitted this issue there as well. https://bugs.xenserver.org/browse/XSO-965

I'll try and find a time when I can bring one of my servers down to experiment with the options and see if one makes a difference.

Kent.

olivierlambert commented 5 years ago

@kentnz you should really change the title to "Citrix Hypervisor 8.0" instead of "XCP 8.0" otherwise don't expect Citrix people to answer :wink:

kentnz commented 5 years ago

Thanks Oliver, have done so. Anything else you can suggest to get this resolved ? thanks Kent.

olivierlambert commented 5 years ago

I'd like to help, but I don't have similar hardware nor experience with HBA :disappointed:

olivierlambert commented 5 years ago

However, I would publish on XCP-ng forum, this is where the community here. Maybe someone is able to assist there! :smile:

kentnz commented 5 years ago

I posted on the XCP-ng forum originally a couple of days ago - but so far no replies (other than my own progress one). https://xcp-ng.org/forum/topic/2006/hba-sr-not-available-when-multipathing-enabled-in-xcp-8-0

Is it worth installing Citrix Hypervisor 8.0 on one of my severs (not XCP) and verifying that this is an issue with the Citrix release ?

I was planning on updating my 5x data centre servers from XenServer 7.2 (the version before they introduced the 3 server host limit) to XCP 8.0, but need to get this sorted first.

thanks Kent.

stormi commented 4 years ago

Is it worth installing Citrix Hypervisor 8.0 on one of my severs (not XCP) and verifying that this is an issue with the Citrix release ?

This is always useful information to have indeed, so that we know if the problem comes from us or from Citrix. I thought you already had since you reported the bug to Citrix.

kentnz commented 4 years ago

Hi @stormi, No - with the help of @olivierlambert and @r1 I was able to work around this issue by copying the settings from our 7.2 server into the multipath.conf file.

I understood (maybe assumed incorrectly) from their feedback that this wasn't an area they had changed and it was their suggestion to post it as a bug to Citrix.

But, I'll make some time in the next few days to install the Citrix version of 8.0 and verify.

Kent.

stormi commented 4 years ago

This isn't an area we have changed indeed, but you never know for sure until it's been tested :)

EthanBannister commented 4 years ago

We are also running 3par within our org and I hit this problem too after upgrading to 8.0

I had to manually add the following to my multipath.conf device list:

    device {
            vendor "3PARdata"
            product "VV"
            path_grouping_policy "multibus"
            path_checker "directio"
            features "0"
            hardware_handler "0"
            prio "const"
            rr_weight "uniform"
    }

I added this to all the hosts within the pool and now everything works as expected.

Hopefully this helps.

stormi commented 4 years ago

There have been insightful (or so it seems) comments from Citrix on https://bugs.xenserver.org/browse/XSO-965

stormi commented 4 years ago

Could someone gather all data about this issue and make a summary as a comment here? Then we could probably document all this on the wiki.