rposudnevskiy / RBDSR

RBDSR - XenServer/XCP-ng Storage Manager plugin for CEPH
GNU Lesser General Public License v2.1
58 stars 23 forks source link

Snapshots in NBD Mode crash VM+Host? #4

Closed mhoffmann75 closed 8 years ago

mhoffmann75 commented 8 years ago

We have evaluated RBDSR in combination with a ceph jewel cluster (ubuntu 14.04) and XenServer 7. From our testing, it seems that every mode of RBDSR has it's pros/cons. Our results: kernel mode +really fast -forces ceph cluster into compatibility mode +most features like VDI creation, snapshot, etc work

fuse mode -quite slow (kernel seems like 5 times faster) +full jewel compatibility +most features like VDI creation, snapshot, etc work

nbd mode +almost (?) as fast as kernel mode +full jewel compatibility -snapshot of a running VM hangs VM and XenServer :-(

Thats why for us the most interesting mode is nbd-Mode. Other ideas welcome :-)

However the crash of the VM you run a snapshot on and the corresponding XenServer Host is a real show stopper. Any idea about that?

The SMLog doesn't seem to reveal any errors:

Aug 1 16:52:10 pns-xen06 SM: [16457] ['rbd', 'ls', '-l', '--format', 'json', '--pool', 'RBD_XenStorage-ff12160f-ff09-40bb-a874-1366ad907f44'] Aug 1 16:52:10 pns-xen06 SM: [16457] preit SUCCESS Aug 1 16:52:10 pns-xen06 SM: [16457] ['ceph', 'df', '-f', 'json'] Aug 1 16:52:11 pns-xen06 SM: [16457] preit SUCCESS Aug 1 16:52:11 pns-xen06 SM: [16457] vdi_snapshot {'sr_uuid': 'ff12160f-ff09-40bb-a874-1366ad907f44', 'subtask_of': 'DummyRef:|67c03c61-f2e5-f725-2758-a646f10845b3|VDI.snapshot', 'vdi_ref': 'OpaqueRef:f25db949-ec0e-e414-8f87-7fa393c228fe', 'vdi_on_boot': 'persist', 'args': [], 'vdi_location': 'bb086701-2e94-4dad-91b0-3e9e0bd56a5d', 'host_ref': 'OpaqueRef:4eac2447-bf4e-c909-560b-ff67a467dd29', 'session_ref': 'OpaqueRef:613194dd-7d4a-b036-2f9e-09fa8a86652f', 'device_config': {'SRmaster': 'true'}, 'command': 'vdi_snapshot', 'vdi_allow_caching': 'false', 'sr_ref': 'OpaqueRef:6d76056b-1874-c808-a9b9-ee9abaa31513', 'driver_params': {'epochhint': 'c187be94-3e5c-c945-db14-2234971c71ee'}, 'vdi_uuid': 'bb086701-2e94-4dad-91b0-3e9e0bd56a5d'} Aug 1 16:52:11 pns-xen06 SM: [16457] RBDVDI.snapshot for bb086701-2e94-4dad-91b0-3e9e0bd56a5d Aug 1 16:52:11 pns-xen06 SM: [16457] Pause request for bb086701-2e94-4dad-91b0-3e9e0bd56a5d Aug 1 16:52:11 pns-xen06 SM: [16457] Calling tap-pause on host OpaqueRef:4eac2447-bf4e-c909-560b-ff67a467dd29 Aug 1 16:52:11 pns-xen06 SM: [16525] lock: opening lock file /var/lock/sm/bb086701-2e94-4dad-91b0-3e9e0bd56a5d/vdi Aug 1 16:52:11 pns-xen06 SM: [16525] lock: acquired /var/lock/sm/bb086701-2e94-4dad-91b0-3e9e0bd56a5d/vdi Aug 1 16:52:11 pns-xen06 SM: [16525] Pause for bb086701-2e94-4dad-91b0-3e9e0bd56a5d Aug 1 16:52:11 pns-xen06 SM: [16525] Calling tap pause with minor 0 Aug 1 16:52:11 pns-xen06 SM: [16525] ['/usr/sbin/tap-ctl', 'pause', '-p', '15665', '-m', '0'] Aug 1 16:52:11 pns-xen06 SM: [16525] = 0 Aug 1 16:52:11 pns-xen06 SM: [16525] lock: released /var/lock/sm/bb086701-2e94-4dad-91b0-3e9e0bd56a5d/vdi Aug 1 16:52:11 pns-xen06 SM: [16525] lock: closed /var/lock/sm/bb086701-2e94-4dad-91b0-3e9e0bd56a5d/vdi Aug 1 16:52:11 pns-xen06 SM: [16457] ['uuidgen', '-r'] Aug 1 16:52:11 pns-xen06 SM: [16457] preit SUCCESS Aug 1 16:52:11 pns-xen06 SM: [16457] ['rbd', 'image-meta', 'list', 'VHD-bb086701-2e94-4dad-91b0-3e9e0bd56a5d', '--pool', 'RBD_XenStorage-ff12160f-ff09-40bb-a874-1366ad907f44', '--format', 'son'] Aug 1 16:52:11 pns-xen06 SM: [16457] preit SUCCESS Aug 1 16:52:11 pns-xen06 SM: [16457] ['rbd', 'snap', 'create', 'VHD-bb086701-2e94-4dad-91b0-3e9e0bd56a5d@SNAP-b2bb2d93-9201-43b2-844b-9e029461b853', '--pool', 'RBD_XenStorage-ff12160f-ff09-40bb-a874-1366ad907f44'] Aug 1 16:52:17 pns-xen06 SM: [16457] preit SUCCESS Aug 1 16:52:17 pns-xen06 SM: [16457] ['rbd', 'snap', 'protect', 'VHD-bb086701-2e94-4dad-91b0-3e9e0bd56a5d@SNAP-b2bb2d93-9201-43b2-844b-9e029461b853', '--pool', 'RBD_XenStorage-ff12160f-ff09-40bb-a874-1366ad907f44'] Aug 1 16:52:22 pns-xen06 SM: [16457] preit SUCCESS Aug 1 16:52:22 pns-xen06 SM: [16457] Unpause request for bb086701-2e94-4dad-91b0-3e9e0bd56a5d secondary=None Aug 1 16:52:22 pns-xen06 SM: [16457] Calling tap-unpause on host OpaqueRef:4eac2447-bf4e-c909-560b-ff67a467dd29 Aug 1 16:52:22 pns-xen06 SM: [16657] lock: opening lock file /var/lock/sm/bb086701-2e94-4dad-91b0-3e9e0bd56a5d/vdi Aug 1 16:52:22 pns-xen06 SM: [16657] lock: acquired /var/lock/sm/bb086701-2e94-4dad-91b0-3e9e0bd56a5d/vdi Aug 1 16:52:22 pns-xen06 SM: [16657] Unpause for bb086701-2e94-4dad-91b0-3e9e0bd56a5d Aug 1 16:52:22 pns-xen06 SM: [16657] Realpath: /dev/nbd/RBD_XenStorage-ff12160f-ff09-40bb-a874-1366ad907f44/VHD-bb086701-2e94-4dad-91b0-3e9e0bd56a5d Aug 1 16:52:22 pns-xen06 SM: [16657] Calling tap unpause with minor 0 Aug 1 16:52:22 pns-xen06 SM: [16657] ['/usr/sbin/tap-ctl', 'unpause', '-p', '15665', '-m', '0', '-a', 'aio:/dev/nbd/RBD_XenStorage-ff12160f-ff09-40bb-a874-1366ad907f44/VHD-bb086701-2e94-4dad-91b0-3e9e0bd56a5d'] Aug 1 16:52:22 pns-xen06 SM: [16657] = 0 Aug 1 16:52:22 pns-xen06 SM: [16657] lock: released /var/lock/sm/bb086701-2e94-4dad-91b0-3e9e0bd56a5d/vdi Aug 1 16:52:22 pns-xen06 SM: [16657] lock: closed /var/lock/sm/bb086701-2e94-4dad-91b0-3e9e0bd56a5d/vdi Aug 1 16:52:22 pns-xen06 SM: [16675] ['rbd', 'ls', '-l', '--format', 'json', '--pool', 'RBD_XenStorage-ff12160f-ff09-40bb-a874-1366ad907f44'] Aug 1 16:52:22 pns-xen06 SM: [16675] preit SUCCESS Aug 1 16:52:22 pns-xen06 SM: [16675] ['ceph', 'df', '-f', 'json'] Aug 1 16:52:23 pns-xen06 SM: [16675] preit SUCCESS Aug 1 16:52:23 pns-xen06 SM: [16675] vdi_update {'sr_uuid': 'ff12160f-ff09-40bb-a874-1366ad907f44', 'subtask_of': 'DummyRef:|e25e4db4-b375-e312-35ce-85071213de7b|VDI.update', 'vdi_ref': 'OpaqueRef:b31bce15-406e-560d-a13a-6a153b7958bd', 'vdi_on_boot': 'persist', 'args': [], 'vdi_location': 'b2bb2d93-9201-43b2-844b-9e029461b853', 'host_ref': 'OpaqueRef:4eac2447-bf4e-c909-560b-ff67a467dd29', 'session_ref': 'OpaqueRef:f7f95e8b-a6cb-e051-7fff-0a07fe962cd0', 'device_config': {'SRmaster': 'true'}, 'command': 'vdi_update', 'vdi_allow_caching': 'false', 'sr_ref': 'OpaqueRef:6d76056b-1874-c808-a9b9-ee9abaa31513', 'vdi_uuid': 'b2bb2d93-9201-43b2-844b-9e029461b853'} Aug 1 16:52:23 pns-xen06 SM: [16675] RBDSR.update for b2bb2d93-9201-43b2-844b-9e029461b853 Aug 1 16:52:23 pns-xen06 SM: [16675] ['rbd', 'image-meta', 'set', 'VHD-bb086701-2e94-4dad-91b0-3e9e0bd56a5d', 'VDI_LABEL', 'Ubi16 0', '--pool', 'RBD_XenStorage-ff12160f-ff09-40bb-a874-1366ad907f44'] Aug 1 16:52:23 pns-xen06 SM: [16675] preit SUCCESS Aug 1 16:52:23 pns-xen06 SM: [16675] ['rbd', 'image-meta', 'set', 'VHD-bb086701-2e94-4dad-91b0-3e9e0bd56a5d', 'VDI_DESCRIPTION', 'Created by template provisioner', '--pool', 'RBD_XenStorage-ff12160f-ff09-40bb-a874-1366ad907f44'] Aug 1 16:52:23 pns-xen06 SM: [16675] preit SUCCESS Aug 1 16:52:23 pns-xen06 SM: [16675] ['rbd', 'image-meta', 'set', 'VHD-bb086701-2e94-4dad-91b0-3e9e0bd56a5d', 'SNAP-b2bb2d93-9201-43b2-844b-9e029461b853', '20160801T14:52:10Z', '--pool', 'RBD_XenStorage-ff12160f-ff09-40bb-a874-1366ad907f44'] Aug 1 16:52:23 pns-xen06 SM: [16675] preit SUCCESS

It seems that the VM cannot access its disk anymore - running OS of VM complains that disk does not response. However VM cannot be shut down and even Host Server (XenServer7) refuses to force stop VM. Hard reboot via console is necessary.

mhoffmann75 commented 8 years ago

Interesting enough after rebooting Host XenServer i still cannot start the VM. Even after deleting snapshot. But this time i get error message about MAP_DUPLICATE_KEY:

Aug 1 17:24:45 pns-xen06 SM: [12820] ['rbd', 'ls', '-l', '--format', 'json', '--pool', 'RBD_XenStorage-ff12160f-ff09-40bb-a874-1366ad907f44'] Aug 1 17:24:45 pns-xen06 SM: [12820] preit SUCCESS Aug 1 17:24:45 pns-xen06 SM: [12820] ['ceph', 'df', '-f', 'json'] Aug 1 17:24:46 pns-xen06 SM: [12820] preit SUCCESS Aug 1 17:24:46 pns-xen06 SM: [12820] vdi_epoch_begin {'sr_uuid': 'ff12160f-ff09-40bb-a874-1366ad907f44', 'subtask_of': 'DummyRef:|eae0510e-1a25-03ad-69c2-076f8ef7899d|VDI.epoch_begin', 'vdi_ref': 'OpaqueRef:f25db949-ec0e-e414-8f87-7fa393c228fe', 'vdi_on_boot': 'persist', 'args': [], 'vdi_location': 'bb086701-2e94-4dad-91b0-3e9e0bd56a5d', 'host_ref': 'OpaqueRef:4eac2447-bf4e-c909-560b-ff67a467dd29', 'session_ref': 'OpaqueRef:412a6490-cf5a-ad5d-38b7-9ee293b4fb77', 'device_config': {'SRmaster': 'true'}, 'command': 'vdi_epoch_begin', 'vdi_allow_caching': 'false', 'sr_ref': 'OpaqueRef:6d76056b-1874-c808-a9b9-ee9abaa31513', 'vdi_uuid': 'bb086701-2e94-4dad-91b0-3e9e0bd56a5d'} Aug 1 17:24:46 pns-xen06 SM: [12896] ['rbd', 'ls', '-l', '--format', 'json', '--pool', 'RBD_XenStorage-ff12160f-ff09-40bb-a874-1366ad907f44'] Aug 1 17:24:46 pns-xen06 SM: [12896] preit SUCCESS Aug 1 17:24:46 pns-xen06 SM: [12896] ['ceph', 'df', '-f', 'json'] Aug 1 17:24:47 pns-xen06 SM: [12896] preit SUCCESS Aug 1 17:24:47 pns-xen06 SM: [12896] vdi_attach {'sr_uuid': 'ff12160f-ff09-40bb-a874-1366ad907f44', 'subtask_of': 'DummyRef:|9553b5a5-5abf-4b71-cfac-92c7612b2cf4|VDI.attach', 'vdi_ref': 'OpaqueRef:f25db949-ec0e-e414-8f87-7fa393c228fe', 'vdi_on_boot': 'persist', 'args': ['true'], 'vdi_location': 'bb086701-2e94-4dad-91b0-3e9e0bd56a5d', 'host_ref': 'OpaqueRef:4eac2447-bf4e-c909-560b-ff67a467dd29', 'session_ref': 'OpaqueRef:4b84018a-2673-2b39-d437-10862c591127', 'device_config': {'SRmaster': 'true'}, 'command': 'vdi_attach', 'vdi_allow_caching': 'false', 'sr_ref': 'OpaqueRef:6d76056b-1874-c808-a9b9-ee9abaa31513', 'vdi_uuid': 'bb086701-2e94-4dad-91b0-3e9e0bd56a5d'} Aug 1 17:24:47 pns-xen06 SM: [12896] lock: opening lock file /var/lock/sm/bb086701-2e94-4dad-91b0-3e9e0bd56a5d/vdi Aug 1 17:24:47 pns-xen06 SM: [12896] result: {'o_direct_reason': 'LICENSE_RESTRICTION', 'params': '/dev/sm/backend/ff12160f-ff09-40bb-a874-1366ad907f44/bb086701-2e94-4dad-91b0-3e9e0bd56a5d', 'o_direct': True, 'xenstore_data': {'scsi/0x12/0x80': 'AIAAEmJiMDg2NzAxLTJlOTQtNGQgIA==', 'scsi/0x12/0x83': 'AIMAMQIBAC1YRU5TUkMgIGJiMDg2NzAxLTJlOTQtNGRhZC05MWIwLTNlOWUwYmQ1NmE1ZCA=', 'vdi-uuid': 'bb086701-2e94-4dad-91b0-3e9e0bd56a5d', 'mem-pool': 'ff12160f-ff09-40bb-a874-1366ad907f44'}} Aug 1 17:24:47 pns-xen06 SM: [12896] lock: closed /var/lock/sm/bb086701-2e94-4dad-91b0-3e9e0bd56a5d/vdi Aug 1 17:24:47 pns-xen06 SM: [12965] ['rbd', 'ls', '-l', '--format', 'json', '--pool', 'RBD_XenStorage-ff12160f-ff09-40bb-a874-1366ad907f44'] Aug 1 17:24:47 pns-xen06 SM: [12965] preit SUCCESS Aug 1 17:24:47 pns-xen06 SM: [12965] ['ceph', 'df', '-f', 'json'] Aug 1 17:24:47 pns-xen06 SM: [12965] preit SUCCESS Aug 1 17:24:47 pns-xen06 SM: [12965] vdi_activate {'sr_uuid': 'ff12160f-ff09-40bb-a874-1366ad907f44', 'subtask_of': 'DummyRef:|9afed3e7-c9b7-af20-f615-43912b792d1e|VDI.activate', 'vdi_ref': 'OpaqueRef:f25db949-ec0e-e414-8f87-7fa393c228fe', 'vdi_on_boot': 'persist', 'args': ['true'], 'vdi_location': 'bb086701-2e94-4dad-91b0-3e9e0bd56a5d', 'host_ref': 'OpaqueRef:4eac2447-bf4e-c909-560b-ff67a467dd29', 'session_ref': 'OpaqueRef:7fa61944-bf78-90c0-338d-c70f193984a9', 'device_config': {'SRmaster': 'true'}, 'command': 'vdi_activate', 'vdi_allow_caching': 'false', 'sr_ref': 'OpaqueRef:6d76056b-1874-c808-a9b9-ee9abaa31513', 'vdi_uuid': 'bb086701-2e94-4dad-91b0-3e9e0bd56a5d'} Aug 1 17:24:47 pns-xen06 SM: [12965] lock: opening lock file /var/lock/sm/bb086701-2e94-4dad-91b0-3e9e0bd56a5d/vdi Aug 1 17:24:47 pns-xen06 SM: [12965] blktap2.activate Aug 1 17:24:47 pns-xen06 SM: [12965] lock: acquired /var/lock/sm/bb086701-2e94-4dad-91b0-3e9e0bd56a5d/vdi Aug 1 17:24:47 pns-xen06 SM: [12965] Adding tag to: bb086701-2e94-4dad-91b0-3e9e0bd56a5d Aug 1 17:24:47 pns-xen06 SM: [12965] Activate lock succeeded Aug 1 17:24:47 pns-xen06 SM: [12965] ['rbd', 'ls', '-l', '--format', 'json', '--pool', 'RBD_XenStorage-ff12160f-ff09-40bb-a874-1366ad907f44'] Aug 1 17:24:48 pns-xen06 SM: [12965] preit SUCCESS Aug 1 17:24:48 pns-xen06 SM: [12965] ['ceph', 'df', '-f', 'json'] Aug 1 17:24:48 pns-xen06 SM: [12965] preit SUCCESS Aug 1 17:24:48 pns-xen06 SM: [12965] RBDVDI.attach for bb086701-2e94-4dad-91b0-3e9e0bd56a5d Aug 1 17:24:48 pns-xen06 SM: [12965] Exception in activate/attach Aug 1 17:24:48 pns-xen06 SM: [12965] Removed host key host_OpaqueRef:4eac2447-bf4e-c909-560b-ff67a467dd29 for bb086701-2e94-4dad-91b0-3e9e0bd56a5d Aug 1 17:24:48 pns-xen06 SM: [12965] * BLKTAP2:<function _activate_locked at 0x1967578>: EXCEPTION <class 'XenAPI.Failure'>, ['MAP_DUPLICATE_KEY', 'VDI', 'sm_config', 'OpaqueRef:f25db949-ec0e-e414-8f87-7fa393c228fe', 'attached'] Aug 1 17:24:48 pns-xen06 SM: [12965] File "/opt/xensource/sm/blktap2.py", line 86, in wrapper Aug 1 17:24:48 pns-xen06 SM: [12965] ret = op(self, _args) Aug 1 17:24:48 pns-xen06 SM: [12965] File "/opt/xensource/sm/blktap2.py", line 1593, in _activate_locked Aug 1 17:24:48 pns-xen06 SM: [12965] self._attach(sr_uuid, vdi_uuid) Aug 1 17:24:48 pns-xen06 SM: [12965] File "/opt/xensource/sm/blktap2.py", line 1658, in _attach Aug 1 17:24:48 pns-xen06 SM: [12965] attach_info = xmlrpclib.loads(self.target.attach(sr_uuid, vdi_uuid))[0][0] Aug 1 17:24:48 pns-xen06 SM: [12965] File "/opt/xensource/sm/blktap2.py", line 1115, in attach Aug 1 17:24:48 pns-xen06 SM: [12965] return self.vdi.attach(sr_uuid, vdi_uuid) Aug 1 17:24:48 pns-xen06 SM: [12965] File "/opt/xensource/sm/RBDSR.py", line 329, in attach Aug 1 17:24:48 pns-xen06 SM: [12965] self.session.xenapi.VDI.add_to_sm_config(vdi_ref, 'attached', 'true') Aug 1 17:24:48 pns-xen06 SM: [12965] File "/usr/lib/python2.7/site-packages/XenAPI.py", line 248, in call Aug 1 17:24:48 pns-xen06 SM: [12965] return self.send(self.name, args) Aug 1 17:24:48 pns-xen06 SM: [12965] File "/usr/lib/python2.7/site-packages/XenAPI.py", line 150, in xenapi_request Aug 1 17:24:48 pns-xen06 SM: [12965] result = _parse_result(getattr(self, methodname)(_full_params)) Aug 1 17:24:48 pns-xen06 SM: [12965] File "/usr/lib/python2.7/site-packages/XenAPI.py", line 222, in _parse_result Aug 1 17:24:48 pns-xen06 SM: [12965] raise Failure(result['ErrorDescription']) Aug 1 17:24:48 pns-xen06 SM: [12965] Aug 1 17:24:48 pns-xen06 SM: [12965] Raising exception [46, The VDI is not available [opterr=['MAP_DUPLICATE_KEY', 'VDI', 'sm_config', 'OpaqueRef:f25db949-ec0e-e414-8f87-7fa393c228fe', 'attached']]] Aug 1 17:24:48 pns-xen06 SM: [12965] lock: released /var/lock/sm/bb086701-2e94-4dad-91b0-3e9e0bd56a5d/vdi Aug 1 17:24:48 pns-xen06 SM: [12965] * generic exception: vdi_activate: EXCEPTION <class 'SR.SROSError'>, The VDI is not available [opterr=['MAP_DUPLICATE_KEY', 'VDI', 'sm_config', 'OpaqueRef:f25db949-ec0e-e414-8f87-7fa393c228fe', 'attached']] Aug 1 17:24:48 pns-xen06 SM: [12965] File "/opt/xensource/sm/SRCommand.py", line 110, in run Aug 1 17:24:48 pns-xen06 SM: [12965] return self._run_locked(sr) Aug 1 17:24:48 pns-xen06 SM: [12965] File "/opt/xensource/sm/SRCommand.py", line 159, in _run_locked Aug 1 17:24:48 pns-xen06 SM: [12965] rv = self._run(sr, target) Aug 1 17:24:48 pns-xen06 SM: [12965] File "/opt/xensource/sm/SRCommand.py", line 264, in _run Aug 1 17:24:48 pns-xen06 SM: [12965] writable, caching_params) Aug 1 17:24:48 pns-xen06 SM: [12965] File "/opt/xensource/sm/blktap2.py", line 1560, in activate Aug 1 17:24:48 pns-xen06 SM: [12965] if self._activate_locked(sr_uuid, vdi_uuid, options): Aug 1 17:24:48 pns-xen06 SM: [12965] File "/opt/xensource/sm/blktap2.py", line 94, in wrapper Aug 1 17:24:48 pns-xen06 SM: [12965] raise xs_errors.XenError(excType, opterr=msg) Aug 1 17:24:48 pns-xen06 SM: [12965] File "/opt/xensource/sm/xs_errors.py", line 52, in init Aug 1 17:24:48 pns-xen06 SM: [12965] raise SR.SROSError(errorcode, errormessage) Aug 1 17:24:48 pns-xen06 SM: [12965] Aug 1 17:24:48 pns-xen06 SM: [12965] ***\ RBD: EXCEPTION <class 'SR.SROSError'>, The VDI is not available [opterr=['MAP_DUPLICATE_KEY', 'VDI', 'sm_config', 'OpaqueRef:f25db949-ec0e-e414-8f87-7fa393c228fe', 'attached']] Aug 1 17:24:48 pns-xen06 SM: [12965] File "/opt/xensource/sm/SRCommand.py", line 352, in run Aug 1 17:24:48 pns-xen06 SM: [12965] ret = cmd.run(sr) Aug 1 17:24:48 pns-xen06 SM: [12965] File "/opt/xensource/sm/SRCommand.py", line 110, in run Aug 1 17:24:48 pns-xen06 SM: [12965] return self._run_locked(sr) Aug 1 17:24:48 pns-xen06 SM: [12965] File "/opt/xensource/sm/SRCommand.py", line 159, in _run_locked Aug 1 17:24:48 pns-xen06 SM: [12965] rv = self._run(sr, target) Aug 1 17:24:48 pns-xen06 SM: [12965] File "/opt/xensource/sm/SRCommand.py", line 264, in _run Aug 1 17:24:48 pns-xen06 SM: [12965] writable, caching_params) Aug 1 17:24:48 pns-xen06 SM: [12965] File "/opt/xensource/sm/blktap2.py", line 1560, in activate Aug 1 17:24:48 pns-xen06 SM: [12965] if self._activate_locked(sr_uuid, vdi_uuid, options): Aug 1 17:24:48 pns-xen06 SM: [12965] File "/opt/xensource/sm/blktap2.py", line 94, in wrapper Aug 1 17:24:48 pns-xen06 SM: [12965] raise xs_errors.XenError(excType, opterr=msg) Aug 1 17:24:48 pns-xen06 SM: [12965] File "/opt/xensource/sm/xs_errors.py", line 52, in init Aug 1 17:24:48 pns-xen06 SM: [12965] raise SR.SROSError(errorcode, errormessage) Aug 1 17:24:48 pns-xen06 SM: [12965] Aug 1 17:24:48 pns-xen06 SM: [12965] lock: closed /var/lock/sm/bb086701-2e94-4dad-91b0-3e9e0bd56a5d/vdi Aug 1 17:24:48 pns-xen06 SM: [13120] ['rbd', 'ls', '-l', '--format', 'json', '--pool', 'RBD_XenStorage-ff12160f-ff09-40bb-a874-1366ad907f44'] Aug 1 17:24:48 pns-xen06 SM: [13120] preit SUCCESS Aug 1 17:24:48 pns-xen06 SM: [13120] ['ceph', 'df', '-f', 'json'] Aug 1 17:24:49 pns-xen06 SM: [13120] preit SUCCESS Aug 1 17:24:49 pns-xen06 SM: [13120] vdi_detach {'sr_uuid': 'ff12160f-ff09-40bb-a874-1366ad907f44', 'subtask_of': 'DummyRef:|ed5e66fb-4281-3ea8-83bd-2c3088b0dc31|VDI.detach', 'vdi_ref': 'OpaqueRef:f25db949-ec0e-e414-8f87-7fa393c228fe', 'vdi_on_boot': 'persist', 'args': [], 'vdi_location': 'bb086701-2e94-4dad-91b0-3e9e0bd56a5d', 'host_ref': 'OpaqueRef:4eac2447-bf4e-c909-560b-ff67a467dd29', 'session_ref': 'OpaqueRef:e8351bd0-d039-92cc-7cee-7b3dfb639242', 'device_config': {'SRmaster': 'true'}, 'command': 'vdi_detach', 'vdi_allow_caching': 'false', 'sr_ref': 'OpaqueRef:6d76056b-1874-c808-a9b9-ee9abaa31513', 'vdi_uuid': 'bb086701-2e94-4dad-91b0-3e9e0bd56a5d'} Aug 1 17:24:49 pns-xen06 SM: [13120] lock: opening lock file /var/lock/sm/bb086701-2e94-4dad-91b0-3e9e0bd56a5d/vdi Aug 1 17:24:49 pns-xen06 SM: [13120] lock: closed /var/lock/sm/bb086701-2e94-4dad-91b0-3e9e0bd56a5d/vdi

Maybe this helps understanding whats going on?

rposudnevskiy commented 8 years ago

Hi Martin,

You can resolve the issue when you get the message about MAP_DUPLICATE_KEY doing this: # xe vdi-list name-label=name of vdi here. name of vdi is usualy something like name of vm 0. you need uuid of vdi after that you can tell xenserver to forget vdi. # xe vdi-forget uuid= uuid from output of command above Then in XenCenter click rescan on your SR after rescan reattach the vdi to your VM in XenCenter.

As for the crash of the VM I will investigate the problem and let you know.

rposudnevskiy commented 8 years ago

Hi Martin, Issue with the vm crash has been fixed. It just requires to unmap nbd device before snapshot and remap after.

Please check.

mhoffmann75 commented 8 years ago

Thanks for the fix! I have installed your latest fix and found snapshot not crashing VM now! Good. However after creating snapshot and deleting it on running VM the VM cannot be rebooted anymore. It again gets stuck. Eventually hanging XenServer Host aswell: Scanning SR takes forever :-(

Also it seems to me that creating a new VM with new VDI refuses to reboot and ends up in the MAP_DUPLICATE_KEY error and adding a second VDI to an existing VM also doesn't work. But both of these issues need more investigation since they might be caused by an already stuck Host. When i can find more time i will try to isolate it further ...

rposudnevskiy commented 8 years ago

Hi Martin, I have made new fix. Please check.

mhoffmann75 commented 8 years ago

Seems to work now! Great work. One minor issue that remains is that adding a second disk to a running vm, xencenter tells me that i need to reboot. But rebooting does not make the disk show up. Only powering down the vm and starting it up again. It seems to me that this issue only appears in RBDSR nbd mode, although i'm not 100% sure.

rposudnevskiy commented 8 years ago

Which OS is on VM? I have added second disk to running VM with Win7. It just required to format disk so that the system could see it I will check it with linux.

rposudnevskiy commented 8 years ago

I tested it on VM with CentOS 7. It works too.

rposudnevskiy commented 8 years ago

Is xen-tools installed on VM?

mhoffmann75 commented 8 years ago

You're right: No Xen Tools installed. So my fault. VM on local storage just behaves the same. So no RBDSR problem :-) Sorry to bother you. Just for completeness: Was a default install of ubuntu 16.04 server without guest tools.

rposudnevskiy commented 8 years ago

Ok. Thanks for your tests. Can I close the issue? :-)

mhoffmann75 commented 8 years ago

Yes please :-)