rposudnevskiy / RBDSR

RBDSR - XenServer/XCP-ng Storage Manager plugin for CEPH
GNU Lesser General Public License v2.1
58 stars 23 forks source link

RBD migration from old plugin #76

Open Emmenemoi opened 6 years ago

Emmenemoi commented 6 years ago

I wrote RBDSR/utils/rbd_metadata_migration.py for old to v2.0 plugin migration. Unfortunately the imported VDI are stuck at SR lock: A loop of commands in SMLog: ['rbd', '--format', 'json', '--name', 'client.admin', '--pool', 'RBD_XenStorage-', 'lock', 'list', 'srlock']

As you can see it only add :name_label and :name_description from former VDI_LABEL and VDI_DESCRIPTION metas.

Is there any mandatory metadata to be able to use rbd VDIs with vhd driver?

If so, would you agree to use defaults during SR's VDIs scan/probe as fallbacks if not present ? (if so, I'll test and push)

Emmenemoi commented 6 years ago

After manually removing the lock all seems fine. Maybe a "crash" consequence.

Unfortunately all is not solved:

Jul  2 13:00:38 13 SM: [18620] PhyLink(/dev/sm/phy/a8726545-cc41-4ff3-b603-fb0e1b83c4ba/1d6329d6-6498-489f-9964-e699789b724d) -> /run/sr-mount/a8726545-cc41-4ff3-b603-fb0e1b83c4ba/1d6329d6-6498-489f-9964-e699789b724d
Jul  2 13:00:38 13 SM: [18620] ['/usr/sbin/tap-ctl', 'allocate']
Jul  2 13:00:38 13 SM: [18620]  = 0
Jul  2 13:00:38 13 SM: [18620] ['/usr/sbin/tap-ctl', 'spawn']
Jul  2 13:00:38 13 SM: [18620]  = 0
Jul  2 13:00:38 13 SM: [18620] ['/usr/sbin/tap-ctl', 'attach', '-p', '19241', '-m', '0']
Jul  2 13:00:38 13 SM: [18620]  = 0
Jul  2 13:00:38 13 SM: [18620] ['/usr/sbin/tap-ctl', 'open', '-p', '19241', '-m', '0', '-a', 'vhd:/run/sr-mount/a8726545-cc41-4ff3-b603-fb0e1b83c4ba/1d6329d6-6498-489f-9964-e699789b724d', '-R', '-t', '40']
Jul  2 13:00:38 13 SM: [18620]  = 22
Jul  2 13:00:38 13 SM: [18620] ['/usr/sbin/tap-ctl', 'detach', '-p', '19241', '-m', '0']
Jul  2 13:00:38 13 SM: [18620]  = 0
Jul  2 13:00:38 13 SM: [18620] ***** ['/usr/sbin/tap-ctl', 'open', '-p', '19241', '-m', '0', '-a', 'vhd:/run/sr-mount/a8726545-cc41-4ff3-b603-fb0e1b83c4ba/1d6329d6-6498-489f-9964-e699789b724d', '-R', '-t', '40'] failed: status=22, pid=19250, errmsg=Invalid argument: EXCEPTION <class 'blktap2.CommandFailure'>, ['/usr/sbin/tap-ctl', 'open', '-p', '19241', '-m', '0', '-a', 'vhd:/run/sr-mount/a8726545-cc41-4ff3-b603-fb0e1b83c4ba/1d6329d6-6498-489f-9964-e699789b724d', '-R', '-t', '40'] failed: status=22, pid=19250, errmsg=Invalid argument
Jul  2 13:00:38 13 SM: [18620]   File "/opt/xensource/sm/blktap2.py", line 795, in launch_on_tap
Jul  2 13:00:38 13 SM: [18620]     TapCtl.open(pid, minor, _type, path, options)
Jul  2 13:00:38 13 SM: [18620]   File "/opt/xensource/sm/blktap2.py", line 402, in open
Jul  2 13:00:38 13 SM: [18620]     cls._pread(args)
Jul  2 13:00:38 13 SM: [18620]   File "/opt/xensource/sm/blktap2.py", line 289, in _pread
Jul  2 13:00:38 13 SM: [18620]     tapctl._wait(quiet)
Jul  2 13:00:38 13 SM: [18620]   File "/opt/xensource/sm/blktap2.py", line 278, in _wait
Jul  2 13:00:38 13 SM: [18620]     raise self.CommandFailure(self.cmd, **info)
Jul  2 13:00:38 13 SM: [18620]
Jul  2 13:00:38 13 SM: [18620] ['/usr/sbin/tap-ctl', 'free', '-m', '0']
Emmenemoi commented 6 years ago

(plateform: XCP-ng 7.4.0)

blodone commented 6 years ago

the lock is located at srlock in the metadata there is also a :managed = 1 if the volume should be availablel...

here is an example of a volume:

{':is_a_snapshot': '0',
 ':managed': '1',
 ':name_description': 'test-001',
 ':name_label': 'test-001',
 ':read_only': '0',
 ':shareable': '0',
 ':sm_config': '{"vdi_type": "aio"}',
 ':type': 'user',
 ':uuid': '01c3acc6-f034-4f75-b733-eb5d59646b18',
 ':vdi_type': 'aio'}
blodone commented 6 years ago

sry, github removed __srlock__

Emmenemoi commented 6 years ago

lock problem is solved.

Example metadata for VHD plugin of RBDSR: { ":is_a_snapshot": "0", ":managed": "1", ":name_description": "monitoring", ":name_label": "vm-monitoring", ":read_only": "0", ": shareable": "0", ":sm_config": "{\"vdi_type\": \"vhd\"}", ":type": "user", ":uuid": "59a9fefd-991b-4ddc-a1f1-cd36edf3587d", ":vdi_type": "vhd", "VDI_DESCRIPTION": "monitoring", => v1.0 meta "VDI_LABEL": "vm-monitoring" => v1.0 meta }

Still the same tap-ctl open error.

blodone commented 6 years ago

i think its related to ['/usr/sbin/tap-ctl', 'open', '-p', '19241', '-m', '0', '-a', 'vhd:/run/sr-mount/a8726545-cc41-4ff3-b603-fb0e1b83c4ba/1d6329d6-6498-489f-9964-e699789b724d', '-R', '-t', '40']

it throws 22 as exit code with INVALID ARGUMENT, try to debug this

Emmenemoi commented 6 years ago

Sure, already noticed that. Could someone send a working tap-ctl open command in rbd mode + vhd mode? Maybe '-a', 'vhd:/run/sr-mount/a8726545-cc41-4ff3-b603-fb0e1b83c4ba/1d6329d6-6498-489f-9964-e699789b724d' is bad.

Emmenemoi commented 6 years ago

Could it be the metadata upgrade? Mine: ":sm_config": "{"vdi_type": "vhd"}", ':vdi_type': 'vhd'

which are in your case: ':sm_config': '{"vdi_type": "aio"}' ':vdi_type': 'aio'

then this would fail to be detected in tap-ctl script (l. 19) for sed "s/vhd/aio/g": aiorbdsrs=xe pbd-list device-config='image-type: aio' | grep sr-uuid | awk '{print $4} END {if (!NR) print "~~~"}'

Your metadata example is for rbd or vhd types? or there could be metadata differences? From what i'm reading in the source, rbd plugin mode is aio VDI_TYPE, vhd plugin mode is vhd VDI_TYPE.

Emmenemoi commented 6 years ago

Can start VM if I change tap-ctl bin to:

rbdsrs=`xe sr-list type=rbd | grep uuid | awk '{print $5}'`
aiorbdsrs=`xe pbd-list device-config='image-type: aio' | grep sr-uuid | awk '{print $4} END {if (!NR) print "~~~"}'`

if [ "$1" == "list" ]; then
    /sbin/tap-ctl-orig $@ | while read line ;
    do
        case $line in
            *$aiorbdsrs*) echo $line | sed "s/aio/vhd/g";;
            *$rbdsrs*) echo $line | sed "s/aio/vhd/g";;
            *) echo $line;;
        esac
    done
else
    case $@ in
        *$aiorbdsrs*) /sbin/tap-ctl-orig `echo $@ | sed "s/vhd/aio/g"`;;
        *$rbdsrs*) /sbin/tap-ctl-orig `echo $@ | sed "s/vhd/aio/g"`;;
        *) /sbin/tap-ctl-orig $@;;
    esac
fi

But I'm sure it might fail later (clone / snaps/ other ?).

Emmenemoi commented 6 years ago

Furthermore, it seems impossible to use kernel mode in v2.0...

maxcuttins commented 6 years ago

@Emmenemoi why wrote a migration plugin while you can just remove the old one and install the new one? Furthermore now there is a v3.0 of the plugin which use QEMU instead of AIO improving compatibility between XEN and CEPH.

Emmenemoi commented 6 years ago

The migration plugin is needed to migrate v1.0 metadata to v2.0 metadata (used during SR probe for importing existing VDIs). The v2.0 plugin is definitively not stable enough. I'll stop trying to solve problems: I succeeded to install v1.0 plugin on new XCP-ng 7.4 install. I'll document the procedure for plugin update (found a very easy way).

Yes, the v3.0 exists, but it didn't exist when I started to need to migrate our infrastructure from a very old v1.0 plugin version (used on XS 7 / kernel mode so "Ceph dumpling" !).

Agree: v3.0 should be the next LTS target (I'll wait XCP-ng 7.5 to be released and write necessary tools to migrate from v1.0 to v3.0 then).