rposudnevskiy / RBDSR

RBDSR - XenServer/XCP-ng Storage Manager plugin for CEPH
GNU Lesser General Public License v2.1
58 stars 23 forks source link

Unable to perform SXM in Xen7 to/from RBDSR #20

Closed nickmdg closed 7 years ago

nickmdg commented 7 years ago

Hi,

We've been working through testing the RBDSR code for deployment into our environment over the past few weeks, so far everything is looking good except the following bug that we've found.

When attempting to move a VM's to or from an LVM repository to an RBD repository, I've uploaded a section of the servers SMLog, also below is what I believe to be the relevant section.

Further information on the environment XenServer 7 with XS70E17 installed Ceph Jewel RBDSR UUID = c3278df0-c7f5-4658-aa27-b61bccd7662a Local SR UUID = f5aa74b2-3a98-860c-5b7a-7301ec8ba1b4

If there is any additional information that might help with this please let me know.

Nov  8 12:14:48 xenserver-testlab2 SM: [7028] Activate lock succeeded
Nov  8 12:14:48 xenserver-testlab2 SM: [7028] RBDSR.handles type rbd
Nov  8 12:14:48 xenserver-testlab2 SM: [7028] RBDSR.load using cephx id xenserver
Nov  8 12:14:48 xenserver-testlab2 SM: [7028] ['ceph', 'df', '--format', 'json', '--name', 'client.xenserver']
Nov  8 12:14:48 xenserver-testlab2 SM: [7028]   pread SUCCESS
Nov  8 12:14:48 xenserver-testlab2 SM: [7028] ['rbd', 'ls', '-l', '--format', 'json', '--pool', 'RBD_XenStorage-c3278df0-c7f5-4658-aa27-b61bccd7662a', '--name', 'client.xenserver']
Nov  8 12:14:48 xenserver-testlab2 SM: [7028]   pread SUCCESS
Nov  8 12:14:48 xenserver-testlab2 SM: [7028] RBDVDI.attach for c40b64c6-32b2-4299-9e08-a31024b2ea3d
Nov  8 12:14:48 xenserver-testlab2 SM: [7028] Calling _map_VHD
Nov  8 12:14:48 xenserver-testlab2 SM: [7028] Calling ceph_plugin
Nov  8 12:14:48 xenserver-testlab2 SM: [7028] Calling rbd/nbd map on host OpaqueRef:1941be1c-04a5-70fc-d7d6-bc2580811da8
Nov  8 12:14:49 xenserver-testlab2 SM: [7189] ['rbd-nbd', '--nbds_max', '64', 'map', 'RBD_XenStorage-c3278df0-c7f5-4658-aa27-b61bccd7662a/VHD-c40b64c6-32b2-4299-9e08-a31024b2ea3d', '--name', 'client.xenserver']
Nov  8 12:14:49 xenserver-testlab2 SM: [7189] FAILED in util.pread: (rc 1) stdout: '', stderr: 'rbd-nbd: failed to map, status: (2) No such file or directory
Nov  8 12:14:49 xenserver-testlab2 SM: [7189] 2016-11-08 12:14:49.147847 7f298627ce00 -1 asok(0x7f29910e1770) AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to bind the UNIX domain socket to '/var/run/ceph/ceph-client.xenserver.asok': (17) File exists
Nov  8 12:14:49 xenserver-testlab2 SM: [7189] '
Nov  8 12:14:49 xenserver-testlab2 SM: [7028] Exception in activate/attach
Nov  8 12:14:49 xenserver-testlab2 SM: [7028] Removed host key host_OpaqueRef:1941be1c-04a5-70fc-d7d6-bc2580811da8 for c40b64c6-32b2-4299-9e08-a31024b2ea3d
Nov  8 12:14:49 xenserver-testlab2 SM: [7028] ***** BLKTAP2:<function _activate_locked at 0x211c6e0>: EXCEPTION <class 'XenAPI.Failure'>, ['XENAPI_PLUGIN_FAILURE', 'map', 'CommandException', 'Operation not permitted']
Nov  8 12:14:49 xenserver-testlab2 SM: [7028]   File "/opt/xensource/sm/blktap2.py", line 86, in wrapper
Nov  8 12:14:49 xenserver-testlab2 SM: [7028]     ret = op(self, *args)
Nov  8 12:14:49 xenserver-testlab2 SM: [7028]   File "/opt/xensource/sm/blktap2.py", line 1593, in _activate_locked
Nov  8 12:14:49 xenserver-testlab2 SM: [7028]     self._attach(sr_uuid, vdi_uuid)
Nov  8 12:14:49 xenserver-testlab2 SM: [7028]   File "/opt/xensource/sm/blktap2.py", line 1658, in _attach
Nov  8 12:14:49 xenserver-testlab2 SM: [7028]     attach_info = xmlrpclib.loads(self.target.attach(sr_uuid, vdi_uuid))[0][0]
Nov  8 12:14:49 xenserver-testlab2 SM: [7028]   File "/opt/xensource/sm/blktap2.py", line 1115, in attach
Nov  8 12:14:49 xenserver-testlab2 SM: [7028]     return self.vdi.attach(sr_uuid, vdi_uuid)
Nov  8 12:14:49 xenserver-testlab2 SM: [7028]   File "/opt/xensource/sm/RBDSR.py", line 401, in attach
Nov  8 12:14:49 xenserver-testlab2 SM: [7028]     self._map_VHD(vdi_uuid)
Nov  8 12:14:49 xenserver-testlab2 SM: [7028]   File "/opt/xensource/sm/cephutils.py", line 372, in _call_plugin
Nov  8 12:14:49 xenserver-testlab2 SM: [7028]     if not self.session.xenapi.host.call_plugin(host_ref, "ceph_plugin", op, args):
Nov  8 12:14:49 xenserver-testlab2 SM: [7028]   File "/usr/lib/python2.7/site-packages/XenAPI.py", line 248, in __call__
Nov  8 12:14:49 xenserver-testlab2 SM: [7028]     return self.__send(self.__name, args)
Nov  8 12:14:49 xenserver-testlab2 SM: [7028]   File "/usr/lib/python2.7/site-packages/XenAPI.py", line 150, in xenapi_request
Nov  8 12:14:49 xenserver-testlab2 SM: [7028]     result = _parse_result(getattr(self, methodname)(*full_params))
Nov  8 12:14:49 xenserver-testlab2 SM: [7028]   File "/usr/lib/python2.7/site-packages/XenAPI.py", line 222, in _parse_result
Nov  8 12:14:49 xenserver-testlab2 SM: [7028]     raise Failure(result['ErrorDescription'])
Nov  8 12:14:49 xenserver-testlab2 SM: [7028]
Nov  8 12:14:49 xenserver-testlab2 SM: [7028] Raising exception [46, The VDI is not available [opterr=['XENAPI_PLUGIN_FAILURE', 'map', 'CommandException', 'Operation not permitted']]]
Nov  8 12:14:49 xenserver-testlab2 SM: [7028] lock: released /var/lock/sm/c40b64c6-32b2-4299-9e08-a31024b2ea3d/vdi
Nov  8 12:14:49 xenserver-testlab2 SM: [7028] ***** generic exception: vdi_activate: EXCEPTION <class 'SR.SROSError'>, The VDI is not available [opterr=['XENAPI_PLUGIN_FAILURE', 'map', 'CommandException', 'Operation not permitted']]
Nov  8 12:14:49 xenserver-testlab2 SM: [7028]   File "/opt/xensource/sm/SRCommand.py", line 110, in run
Nov  8 12:14:49 xenserver-testlab2 SM: [7028]     return self._run_locked(sr)
Nov  8 12:14:49 xenserver-testlab2 SM: [7028]   File "/opt/xensource/sm/SRCommand.py", line 159, in _run_locked
Nov  8 12:14:49 xenserver-testlab2 SM: [7028]     rv = self._run(sr, target)
Nov  8 12:14:49 xenserver-testlab2 SM: [7028]   File "/opt/xensource/sm/SRCommand.py", line 264, in _run
Nov  8 12:14:49 xenserver-testlab2 SM: [7028]     writable, caching_params)
Nov  8 12:14:49 xenserver-testlab2 SM: [7028]   File "/opt/xensource/sm/blktap2.py", line 1560, in activate
Nov  8 12:14:49 xenserver-testlab2 SM: [7028]     if self._activate_locked(sr_uuid, vdi_uuid, options):
Nov  8 12:14:49 xenserver-testlab2 SM: [7028]   File "/opt/xensource/sm/blktap2.py", line 94, in wrapper
Nov  8 12:14:49 xenserver-testlab2 SM: [7028]     raise xs_errors.XenError(excType, opterr=msg)
Nov  8 12:14:49 xenserver-testlab2 SM: [7028]   File "/opt/xensource/sm/xs_errors.py", line 52, in __init__
Nov  8 12:14:49 xenserver-testlab2 SM: [7028]     raise SR.SROSError(errorcode, errormessage)
Nov  8 12:14:49 xenserver-testlab2 SM: [7028]
Nov  8 12:14:49 xenserver-testlab2 SM: [7028] ***** RBD: EXCEPTION <class 'SR.SROSError'>, The VDI is not available [opterr=['XENAPI_PLUGIN_FAILURE', 'map', 'CommandException', 'Operation not permitted']]
Nov  8 12:14:49 xenserver-testlab2 SM: [7028]   File "/opt/xensource/sm/SRCommand.py", line 352, in run
Nov  8 12:14:49 xenserver-testlab2 SM: [7028]     ret = cmd.run(sr)
Nov  8 12:14:49 xenserver-testlab2 SM: [7028]   File "/opt/xensource/sm/SRCommand.py", line 110, in run
Nov  8 12:14:49 xenserver-testlab2 SM: [7028]     return self._run_locked(sr)
Nov  8 12:14:49 xenserver-testlab2 SM: [7028]   File "/opt/xensource/sm/SRCommand.py", line 159, in _run_locked
Nov  8 12:14:49 xenserver-testlab2 SM: [7028]     rv = self._run(sr, target)
Nov  8 12:14:49 xenserver-testlab2 SM: [7028]   File "/opt/xensource/sm/SRCommand.py", line 264, in _run
Nov  8 12:14:49 xenserver-testlab2 SM: [7028]     writable, caching_params)
Nov  8 12:14:49 xenserver-testlab2 SM: [7028]   File "/opt/xensource/sm/blktap2.py", line 1560, in activate
Nov  8 12:14:49 xenserver-testlab2 SM: [7028]     if self._activate_locked(sr_uuid, vdi_uuid, options):
Nov  8 12:14:49 xenserver-testlab2 SM: [7028]   File "/opt/xensource/sm/blktap2.py", line 94, in wrapper
Nov  8 12:14:49 xenserver-testlab2 SM: [7028]     raise xs_errors.XenError(excType, opterr=msg)
Nov  8 12:14:49 xenserver-testlab2 SM: [7028]   File "/opt/xensource/sm/xs_errors.py", line 52, in __init__
Nov  8 12:14:49 xenserver-testlab2 SM: [7028]     raise SR.SROSError(errorcode, errormessage)
Nov  8 12:14:49 xenserver-testlab2 SM: [7028]
Nov  8 12:14:49 xenserver-testlab2 SM: [7028] lock: closed /var/lock/sm/c40b64c6-32b2-4299-9e08-a31024b2ea3d/vdi
Nov  8 12:14:49 xenserver-testlab2 SM: [7224] lock: opening lock file /var/lock/sm/f5aa74b2-3a98-860c-5b7a-7301ec8ba1b4/sr
Nov  8 12:14:49 xenserver-testlab2 SM: [7224] LVMCache created for VG_XenStorage-f5aa74b2-3a98-860c-5b7a-7301ec8ba1b4
Nov  8 12:14:49 xenserver-testlab2 SM: [7224] ['/sbin/vgs', 'VG_XenStorage-f5aa74b2-3a98-860c-5b7a-7301ec8ba1b4']
Nov  8 12:14:53 xenserver-testlab2 SM: [7224]   pread SUCCESS
Nov  8 12:14:53 xenserver-testlab2 SM: [7224] Entering _checkMetadataVolume
Nov  8 12:14:53 xenserver-testlab2 SM: [7224] LVMCache: will initialize now
Nov  8 12:14:53 xenserver-testlab2 SM: [7224] LVMCache: refreshing
Nov  8 12:14:53 xenserver-testlab2 SM: [7224] ['/sbin/lvs', '--noheadings', '--units', 'b', '-o', '+lv_tags', '/dev/VG_XenStorage-f5aa74b2-3a98-860c-5b7a-7301ec8ba1b4']
Nov  8 12:14:56 xenserver-testlab2 SM: [7224]   pread SUCCESS
Nov  8 12:14:56 xenserver-testlab2 SM: [7224] vdi_deactivate {'sr_uuid': 'f5aa74b2-3a98-860c-5b7a-7301ec8ba1b4', 'subtask_of': 'DummyRef:|5ac4b40e-398b-90c9-f3fd-253c92e2875c|VDI.deactivate', 'vdi_ref': 'OpaqueRef:dc8fa82e-275b-03a6-a41f-297bc687cf2c', 'vdi_on_boot': 'persist', 'args': [], 'vdi_location': 'e711cd11-e3fe-456c-b42d-2e8c933354c6', 'host_ref': 'OpaqueRef:1941be1c-04a5-70fc-d7d6-bc2580811da8', 'session_ref': 'OpaqueRef:6d4ee445-d0d5-f456-32eb-7092a779ff78', 'device_config': {'device': '/dev/disk/by-id/ata-ST500DM002-1BD142_Z6EHZG38-part3', 'SRmaster': 'true'}, 'command': 'vdi_deactivate', 'vdi_allow_caching': 'false', 'sr_ref': 'OpaqueRef:5672509e-4c88-178d-d6a0-8f103b614caf', 'vdi_uuid': 'e711cd11-e3fe-456c-b42d-2e8c933354c6'}
Nov  8 12:14:56 xenserver-testlab2 SM: [7224] lock: opening lock file /var/lock/sm/e711cd11-e3fe-456c-b42d-2e8c933354c6/vdi
Nov  8 12:14:56 xenserver-testlab2 SM: [7224] blktap2.deactivate
Nov  8 12:14:56 xenserver-testlab2 SM: [7224] lock: acquired /var/lock/sm/e711cd11-e3fe-456c-b42d-2e8c933354c6/vdi
Nov  8 12:14:56 xenserver-testlab2 SM: [7224] ['/usr/sbin/tap-ctl', 'close', '-p', '6939', '-m', '21']
Nov  8 12:14:56 xenserver-testlab2 SM: [7224]  = 0
Nov  8 12:14:56 xenserver-testlab2 SM: [7224] Attempt to deregister tapdisk with RRDD.
Nov  8 12:14:56 xenserver-testlab2 SM: [7224] ERROR: Failed to deregister tapdisk with RRDD due to UnixStreamHTTP instance has no attribute 'getresponse'
Nov  8 12:14:56 xenserver-testlab2 SM: [7224] ['/usr/sbin/tap-ctl', 'detach', '-p', '6939', '-m', '21']
Nov  8 12:14:56 xenserver-testlab2 SM: [7224]  = 0
Nov  8 12:14:56 xenserver-testlab2 SM: [7224] ['/usr/sbin/tap-ctl', 'free', '-m', '21']
Nov  8 12:14:56 xenserver-testlab2 SM: [7224]  = 0

ceph-SXM-error.txt

rposudnevskiy commented 7 years ago

Hi, The problem is here

Nov  8 12:14:49 xenserver-testlab2 SM: [7189] ['rbd-nbd', '--nbds_max', '64', 'map', 'RBD_XenStorage-c3278df0-c7f5-4658-aa27-b61bccd7662a/VHD-c40b64c6-32b2-4299-9e08-a31024b2ea3d', '--name', 'client.xenserver']
Nov  8 12:14:49 xenserver-testlab2 SM: [7189] FAILED in util.pread: (rc 1) stdout: '', stderr: 'rbd-nbd: failed to map, status: (2) No such file or directory

rbd-nbd couldn't map the rbd image as nbd device.

could you please upload the output of this command (run it on xenserver) rbd ls -l --pool RBD_XenStorage-c3278df0-c7f5-4658-aa27-b61bccd7662a --name client.xenserver

Thank you

rposudnevskiy commented 7 years ago

Try to map image by hand rbd-nbd map RBD_XenStorage-c3278df0-c7f5-4658-aa27-b61bccd7662a/VHD-c40b64c6-32b2-4299-9e08-a31024b2ea3d --name client.xenserver What is it saying?

nickmdg commented 7 years ago

Thanks for having a look at this for me, I've uploaded two files, ceph-SXM-error-2.txt is a full output from the SMLog for this test. The second file ceph-SXM-error-2-cmd.txt contains the output of the two commands you've asked that I run for you.

From what I can see VHD-945a77c5-a1cf-42a6-a66a-50e06c621110 does not exist at all

ceph-SXM-error-2.txt ceph-SXM-error-2-cmd.txt

rposudnevskiy commented 7 years ago

Hi, The last commit should fix this issue. Please check.

rposudnevskiy commented 7 years ago

Still have problems with SXM on XenServer 7

rposudnevskiy commented 7 years ago

Hi, Fixed. Tested RBD->NFS and NFS->RBD migration