rposudnevskiy / RBDSR

RBDSR - XenServer/XCP-ng Storage Manager plugin for CEPH
GNU Lesser General Public License v2.1
58 stars 24 forks source link

__srlock__ not populating with proper ID and exception "401: You must log in" #72

Open mdmeier opened 6 years ago

mdmeier commented 6 years ago

Hi Roman,

Long time no chat. I've recently undertaken to upgrade CEPH from kraken to luminous and have come across a strange problem. When migrating a VDI from another SR to CEPH I'm getting the following every second in /var/log/SMlog:

Apr 20 01:51:11 cloud103-15 SM: [10286] rbdsr_lock.Lock._trylock Apr 20 01:51:11 cloud103-15 SM: [10286] rbdsr_lock: Trying to lock 'srlock' Apr 20 01:51:11 cloud103-15 SM: [10286] rbdsr_lock.Lock.held` Apr 20 01:51:11 cloud103-15 SM: [10286] rbdsr_lock.Lock._get_srlocker Apr 20 01:51:11 cloud103-15 SM: [10286] ['rbd', '--format', 'json', '--name', 'client.admin', '--pool', 'RBD_XenStorage-2dd455e9-0de4-4ed8-af62-64e1a4ace678', 'lock', 'list', 'srlock'] Apr 20 01:51:11 cloud103-15 SM: [10286] pread SUCCESS

Which is odd, because from what I can tell, srlock should be replaced with an actual VDI ID? When I check locks I see:

rbd --name client.admin --pool RBD_XenStorage-2dd455e9-0de4-4ed8-af62-64e1a4ace678 lock list srlock

There is 1 exclusive lock on this image. Locker ID Address
client.467981955 locked 192.168.1xx.1xx:0/2059369044

Which is always another server. If I'm persistent enough I can "rbd lock remove" them so that this server catches the lock, but then I get:

Apr 20 01:53:54 cloud103-15 SM: [10286] rbdsr_lock: acquired 'client.467921638' Apr 20 01:53:54 cloud103-15 SM: [10286] Exception in activate/attach Apr 20 01:53:54 cloud103-15 SM: [10286] failed to remove tag: <Fault 401: 'You must log in'> Apr 20 01:53:54 cloud103-15 SM: [10286] * BLKTAP2:<function _activate_locked at 0x14776e0>: > EXCEPTION <class 'xmlrpclib.Fault'>, <Fault 401: 'You must log in'> Apr 20 01:53:54 cloud103-15 SM: [10286] File "/opt/xensource/sm/blktap2.py", line 87, in wrapper Apr 20 01:53:54 cloud103-15 SM: [10286] ret = op(self, args) Apr 20 01:53:54 cloud103-15 SM: [10286] File "/opt/xensource/sm/blktap2.py", line 1602, in _activate_locked Apr 20 01:53:54 cloud103-15 SM: [10286] self._remove_tag(vdi_uuid) Apr 20 01:53:54 cloud103-15 SM: [10286] File "/opt/xensource/sm/blktap2.py", line 1452, in _remove_tag Apr 20 01:53:54 cloud103-15 SM: [10286] vdi_ref = self._session.xenapi.VDI.get_by_uuid(vdi_uuid) Apr 20 01:53:54 cloud103-15 SM: [10286] File "/usr/lib/python2.7/site-packages/XenAPI.py", line 254, in call Apr 20 01:53:54 cloud103-15 SM: [10286] return self.send(self.name, args) Apr 20 01:53:54 cloud103-15 SM: [10286] File "/usr/lib/python2.7/site-packages/XenAPI.py", line 157, in xenapi_request Apr 20 01:53:54 cloud103-15 SM: [10286] raise xmlrpclib.Fault(401, 'You must log in') Apr 20 01:53:54 cloud103-15 SM: [10286] Apr 20 01:53:54 cloud103-15 SM: [10286] lock: released /var/lock/sm/302db214-90cf-4600-84ac-6bc9b053c61c/vdi Apr 20 01:53:54 cloud103-15 SM: [10286] generic exception: vdi_activate: EXCEPTION <class 'xmlrpclib.Fault'>, <Fault 401: 'You must log in'> Apr 20 01:53:54 cloud103-15 SM: [10286] File "/opt/xensource/sm/SRCommand.py", line 110, in run Apr 20 01:53:54 cloud103-15 SM: [10286] return self._run_locked(sr) Apr 20 01:53:54 cloud103-15 SM: [10286] File "/opt/xensource/sm/SRCommand.py", line 159, in _run_locked Apr 20 01:53:54 cloud103-15 SM: [10286] rv = self._run(sr, target) Apr 20 01:53:54 cloud103-15 SM: [10286] File "/opt/xensource/sm/SRCommand.py", line 264, in _run Apr 20 01:53:54 cloud103-15 SM: [10286] writable, caching_params) Apr 20 01:53:54 cloud103-15 SM: [10286] File "/opt/xensource/sm/blktap2.py", line 1541, in activate Apr 20 01:53:54 cloud103-15 SM: [10286] if self._activate_locked(sr_uuid, vdi_uuid, options): Apr 20 01:53:54 cloud103-15 SM: [10286] File "/opt/xensource/sm/blktap2.py", line 87, in wrapper Apr 20 01:53:54 cloud103-15 SM: [10286] ret = op(self, *args) Apr 20 01:53:54 cloud103-15 SM: [10286] File "/opt/xensource/sm/blktap2.py", line 1602, in _activate_locked Apr 20 01:53:54 cloud103-15 SM: [10286] self._remove_tag(vdi_uuid) Apr 20 01:53:54 cloud103-15 SM: [10286] File "/opt/xensource/sm/blktap2.py", line 1452, in _remove_tag Apr 20 01:53:54 cloud103-15 SM: [10286] vdi_ref = self._session.xenapi.VDI.get_by_uuid(vdi_uuid) Apr 20 01:53:54 cloud103-15 SM: [10286] File "/usr/lib/python2.7/site-packages/XenAPI.py", line 254, in call Apr 20 01:53:54 cloud103-15 SM: [10286] return self.send(self.name, args) Apr 20 01:53:54 cloud103-15 SM: [10286] File "/usr/lib/python2.7/site-packages/XenAPI.py", line 157, in xenapi_request Apr 20 01:53:54 cloud103-15 SM: [10286] raise xmlrpclib.Fault(401, 'You must log in') Apr 20 01:53:54 cloud103-15 SM: [10286] Apr 20 01:53:54 cloud103-15 SM: [10286] **** RBD: EXCEPTION <class 'xmlrpclib.Fault'>, <Fault 401: 'You must log in'> Apr 20 01:53:54 cloud103-15 SM: [10286] File "/opt/xensource/sm/SRCommand.py", line 353, in run Apr 20 01:53:54 cloud103-15 SM: [10286] ret = cmd.run(sr) Apr 20 01:53:54 cloud103-15 SM: [10286] File "/opt/xensource/sm/SRCommand.py", line 110, in run Apr 20 01:53:54 cloud103-15 SM: [10286] return self._run_locked(sr) Apr 20 01:53:54 cloud103-15 SM: [10286] File "/opt/xensource/sm/SRCommand.py", line 159, in _run_locked Apr 20 01:53:54 cloud103-15 SM: [10286] rv = self._run(sr, target) Apr 20 01:53:54 cloud103-15 SM: [10286] File "/opt/xensource/sm/SRCommand.py", line 264, in _run Apr 20 01:53:54 cloud103-15 SM: [10286] writable, caching_params) Apr 20 01:53:54 cloud103-15 SM: [10286] File "/opt/xensource/sm/blktap2.py", line 1541, in activate Apr 20 01:53:54 cloud103-15 SM: [10286] if self._activate_locked(sr_uuid, vdi_uuid, options): Apr 20 01:53:54 cloud103-15 SM: [10286] File "/opt/xensource/sm/blktap2.py", line 87, in wrapper Apr 20 01:53:54 cloud103-15 SM: [10286] ret = op(self, args) Apr 20 01:53:54 cloud103-15 SM: [10286] File "/opt/xensource/sm/blktap2.py", line 1602, in _activate_locked Apr 20 01:53:54 cloud103-15 SM: [10286] self._remove_tag(vdi_uuid) Apr 20 01:53:54 cloud103-15 SM: [10286] File "/opt/xensource/sm/blktap2.py", line 1452, in _remove_tag Apr 20 01:53:54 cloud103-15 SM: [10286] vdi_ref = self._session.xenapi.VDI.get_by_uuid(vdi_uuid) Apr 20 01:53:54 cloud103-15 SM: [10286] File "/usr/lib/python2.7/site-packages/XenAPI.py", line 254, in call Apr 20 01:53:54 cloud103-15 SM: [10286] return self.send(self.name, args) Apr 20 01:53:54 cloud103-15 SM: [10286] File "/usr/lib/python2.7/site-packages/XenAPI.py", line 157, in xenapi_request Apr 20 01:53:54 cloud103-15 SM: [10286] raise xmlrpclib.Fault(401, 'You must log in') Apr 20 01:53:54 cloud103-15 SM: [10286] Apr 20 01:53:54 cloud103-15 SM: [10286] lock: closed /var/lock/sm/302db214-90cf-4600-84ac-6bc9b053c61c/vdi

Any chance you can help with this? I'm unable to create new VDIs on my CEPH SR right now because of this.

mdmeier commented 6 years ago

The fundamental problem appears to be with mapping rbd?

[root@cloud103-15 ~]# rbd nbd map --device /dev/ndb1 --nbds_max 64 RBD_XenStorage-2dd455e9-0de4-4ed8-af62-64e1a4ace678/VHD-50d28620-24b7-45f9-99f4-7f5ee0bc739e --name client.admin rbd-nbd: ignoring kernel module parameter options: nbd module already loaded rbd-nbd: failed to open device: /dev/ndb1 rbd: rbd-nbd failed with error: /usr/bin/rbd-nbd: exit status: 1