Open Emmenemoi opened 6 years ago
After manually removing the lock all seems fine. Maybe a "crash" consequence.
Unfortunately all is not solved:
Jul 2 13:00:38 13 SM: [18620] PhyLink(/dev/sm/phy/a8726545-cc41-4ff3-b603-fb0e1b83c4ba/1d6329d6-6498-489f-9964-e699789b724d) -> /run/sr-mount/a8726545-cc41-4ff3-b603-fb0e1b83c4ba/1d6329d6-6498-489f-9964-e699789b724d
Jul 2 13:00:38 13 SM: [18620] ['/usr/sbin/tap-ctl', 'allocate']
Jul 2 13:00:38 13 SM: [18620] = 0
Jul 2 13:00:38 13 SM: [18620] ['/usr/sbin/tap-ctl', 'spawn']
Jul 2 13:00:38 13 SM: [18620] = 0
Jul 2 13:00:38 13 SM: [18620] ['/usr/sbin/tap-ctl', 'attach', '-p', '19241', '-m', '0']
Jul 2 13:00:38 13 SM: [18620] = 0
Jul 2 13:00:38 13 SM: [18620] ['/usr/sbin/tap-ctl', 'open', '-p', '19241', '-m', '0', '-a', 'vhd:/run/sr-mount/a8726545-cc41-4ff3-b603-fb0e1b83c4ba/1d6329d6-6498-489f-9964-e699789b724d', '-R', '-t', '40']
Jul 2 13:00:38 13 SM: [18620] = 22
Jul 2 13:00:38 13 SM: [18620] ['/usr/sbin/tap-ctl', 'detach', '-p', '19241', '-m', '0']
Jul 2 13:00:38 13 SM: [18620] = 0
Jul 2 13:00:38 13 SM: [18620] ***** ['/usr/sbin/tap-ctl', 'open', '-p', '19241', '-m', '0', '-a', 'vhd:/run/sr-mount/a8726545-cc41-4ff3-b603-fb0e1b83c4ba/1d6329d6-6498-489f-9964-e699789b724d', '-R', '-t', '40'] failed: status=22, pid=19250, errmsg=Invalid argument: EXCEPTION <class 'blktap2.CommandFailure'>, ['/usr/sbin/tap-ctl', 'open', '-p', '19241', '-m', '0', '-a', 'vhd:/run/sr-mount/a8726545-cc41-4ff3-b603-fb0e1b83c4ba/1d6329d6-6498-489f-9964-e699789b724d', '-R', '-t', '40'] failed: status=22, pid=19250, errmsg=Invalid argument
Jul 2 13:00:38 13 SM: [18620] File "/opt/xensource/sm/blktap2.py", line 795, in launch_on_tap
Jul 2 13:00:38 13 SM: [18620] TapCtl.open(pid, minor, _type, path, options)
Jul 2 13:00:38 13 SM: [18620] File "/opt/xensource/sm/blktap2.py", line 402, in open
Jul 2 13:00:38 13 SM: [18620] cls._pread(args)
Jul 2 13:00:38 13 SM: [18620] File "/opt/xensource/sm/blktap2.py", line 289, in _pread
Jul 2 13:00:38 13 SM: [18620] tapctl._wait(quiet)
Jul 2 13:00:38 13 SM: [18620] File "/opt/xensource/sm/blktap2.py", line 278, in _wait
Jul 2 13:00:38 13 SM: [18620] raise self.CommandFailure(self.cmd, **info)
Jul 2 13:00:38 13 SM: [18620]
Jul 2 13:00:38 13 SM: [18620] ['/usr/sbin/tap-ctl', 'free', '-m', '0']
(plateform: XCP-ng 7.4.0)
the lock is located at srlock in the metadata there is also a :managed = 1 if the volume should be availablel...
here is an example of a volume:
{':is_a_snapshot': '0',
':managed': '1',
':name_description': 'test-001',
':name_label': 'test-001',
':read_only': '0',
':shareable': '0',
':sm_config': '{"vdi_type": "aio"}',
':type': 'user',
':uuid': '01c3acc6-f034-4f75-b733-eb5d59646b18',
':vdi_type': 'aio'}
sry, github removed
__srlock__
lock problem is solved.
Example metadata for VHD plugin of RBDSR: { ":is_a_snapshot": "0", ":managed": "1", ":name_description": "monitoring", ":name_label": "vm-monitoring", ":read_only": "0", ": shareable": "0", ":sm_config": "{\"vdi_type\": \"vhd\"}", ":type": "user", ":uuid": "59a9fefd-991b-4ddc-a1f1-cd36edf3587d", ":vdi_type": "vhd", "VDI_DESCRIPTION": "monitoring", => v1.0 meta "VDI_LABEL": "vm-monitoring" => v1.0 meta }
Still the same tap-ctl open error.
i think its related to ['/usr/sbin/tap-ctl', 'open', '-p', '19241', '-m', '0', '-a', 'vhd:/run/sr-mount/a8726545-cc41-4ff3-b603-fb0e1b83c4ba/1d6329d6-6498-489f-9964-e699789b724d', '-R', '-t', '40']
it throws 22 as exit code with INVALID ARGUMENT, try to debug this
Sure, already noticed that. Could someone send a working tap-ctl open command in rbd mode + vhd mode? Maybe '-a', 'vhd:/run/sr-mount/a8726545-cc41-4ff3-b603-fb0e1b83c4ba/1d6329d6-6498-489f-9964-e699789b724d' is bad.
Could it be the metadata upgrade? Mine: ":sm_config": "{"vdi_type": "vhd"}", ':vdi_type': 'vhd'
which are in your case: ':sm_config': '{"vdi_type": "aio"}' ':vdi_type': 'aio'
then this would fail to be detected in tap-ctl script (l. 19) for sed "s/vhd/aio/g":
aiorbdsrs=xe pbd-list device-config='image-type: aio' | grep sr-uuid | awk '{print $4} END {if (!NR) print "~~~"}'
Your metadata example is for rbd or vhd types? or there could be metadata differences? From what i'm reading in the source, rbd plugin mode is aio VDI_TYPE, vhd plugin mode is vhd VDI_TYPE.
Can start VM if I change tap-ctl bin to:
rbdsrs=`xe sr-list type=rbd | grep uuid | awk '{print $5}'`
aiorbdsrs=`xe pbd-list device-config='image-type: aio' | grep sr-uuid | awk '{print $4} END {if (!NR) print "~~~"}'`
if [ "$1" == "list" ]; then
/sbin/tap-ctl-orig $@ | while read line ;
do
case $line in
*$aiorbdsrs*) echo $line | sed "s/aio/vhd/g";;
*$rbdsrs*) echo $line | sed "s/aio/vhd/g";;
*) echo $line;;
esac
done
else
case $@ in
*$aiorbdsrs*) /sbin/tap-ctl-orig `echo $@ | sed "s/vhd/aio/g"`;;
*$rbdsrs*) /sbin/tap-ctl-orig `echo $@ | sed "s/vhd/aio/g"`;;
*) /sbin/tap-ctl-orig $@;;
esac
fi
But I'm sure it might fail later (clone / snaps/ other ?).
Furthermore, it seems impossible to use kernel mode in v2.0...
@Emmenemoi why wrote a migration plugin while you can just remove the old one and install the new one? Furthermore now there is a v3.0 of the plugin which use QEMU instead of AIO improving compatibility between XEN and CEPH.
The migration plugin is needed to migrate v1.0 metadata to v2.0 metadata (used during SR probe for importing existing VDIs). The v2.0 plugin is definitively not stable enough. I'll stop trying to solve problems: I succeeded to install v1.0 plugin on new XCP-ng 7.4 install. I'll document the procedure for plugin update (found a very easy way).
Yes, the v3.0 exists, but it didn't exist when I started to need to migrate our infrastructure from a very old v1.0 plugin version (used on XS 7 / kernel mode so "Ceph dumpling" !).
Agree: v3.0 should be the next LTS target (I'll wait XCP-ng 7.5 to be released and write necessary tools to migrate from v1.0 to v3.0 then).
I wrote RBDSR/utils/rbd_metadata_migration.py for old to v2.0 plugin migration. Unfortunately the imported VDI are stuck at SR lock: A loop of commands in SMLog: ['rbd', '--format', 'json', '--name', 'client.admin', '--pool', 'RBD_XenStorage-', 'lock', 'list', 'srlock']
As you can see it only add :name_label and :name_description from former VDI_LABEL and VDI_DESCRIPTION metas.
Is there any mandatory metadata to be able to use rbd VDIs with vhd driver?
If so, would you agree to use defaults during SR's VDIs scan/probe as fallbacks if not present ? (if so, I'll test and push)