olopez32 / ganeti

Automatically exported from code.google.com/p/ganeti
0 stars 0 forks source link

add QA for fix of: Ganeti fails to rollback on a failed disk conversion (plain -> DRBD) #229

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Hello,

I’am running Ganeti 2.4.5 on a Debian Squeeze.

The command
gnt-instance modify -t drbd -n <secondary node> <vm>
may fail to assemble a DRBD device (for instance when DRBD device minor numbers 
are exhausted because DRBD minor_count is too low).
In such case, Ganeti fails to rollback to the previous state and the VM is 
broken.

Given a VM named “wonderfulvm” which is running well on a plain LVM volume :

# lvs > lvs-before
# gnt-instance modify -t drbd -n charlie.dom0.nautile.net wonderfulvm
Fri Apr 13 15:39:00 2012 Converting template to drbd
Fri Apr 13 15:39:00 2012 Creating aditional volumes...
Fri Apr 13 15:39:05 2012 Renaming original volumes...
Fri Apr 13 15:39:07 2012 Initializing DRBD devices...
Failure: command execution error:
Can't create block device 
<DRBD8(hosts=alfa.dom0.nautile.net/32-charlie.dom0.nautile.net/32, port=11099, 
configured as 10.41.3.1:11099 10.41.3.3:11099, 
backend=<LogicalVolume(/dev/vgganeti/a42375e5-532d-4cb0-9252-a266bc67d3be.disk0_
data, not visible, size=10240m)>, 
metadev=<LogicalVolume(/dev/vgganeti/a42375e5-532d-4cb0-9252-a266bc67d3be.disk0_
meta, not visible, size=128m)>, visible as /dev/disk/0, size=10240m)> on node 
alfa.dom0.nautile.net for instance wonderfulvm: Can't assemble device after 
creation, unusual event: drbd32: can't attach local disk: /dev/drbd32: Failure: 
(127) Device minor not allocated
# lvs > lvs-after
# diff -U 1 lvs-before lvs-after 
--- lvs-before  2012-04-13 15:38:33.407693878 +1100
+++ lvs-after   2012-04-13 15:39:27.068493906 +1100
@@ -65,3 +65,2 @@
   4535addf-852e-4c75-945c-b90cfc7b2fae.disk0_meta vgganeti    -wi---  128,00m
-  4e5a0a89-ef64-4f3b-951a-1c1712f50b7b.disk0      vgganeti    -wi-a-   10,00g
   5719fd81-9501-495a-b4fd-51489a81c0b5.disk1_data vgganeti    -wi---   25,00g
@@ -89,2 +88,4 @@
   9f069e79-6fcb-41a5-8534-608d7f7d7ae7.disk0_meta vgganeti    -wi-a-  128,00m
+  a42375e5-532d-4cb0-9252-a266bc67d3be.disk0_data vgganeti    -wi-a-   10,00g
+  a42375e5-532d-4cb0-9252-a266bc67d3be.disk0_meta vgganeti    -wi-a-  128,00m
   a8dccdfe-8276-4c81-9d49-bc793ff0af9d.disk1_meta vgganeti    -wi-a-  128,00m
# gnt-instance info wonderfulvm
Instance name: wonderfulvm
UUID: bbd5fd66-180d-4d92-85de-6aaf2bd2d3f4
Serial number: 1
Creation time: 2012-04-13 15:37:43
Modification time: 2012-04-13 15:37:43
State: configured to be down, actual state is down
  Nodes:
    - primary: alfa.dom0.nautile.net
    - secondaries: 
  ...
  Disk template: plain
  Disks:
    - disk/0: lvm, size 10.0G
      access mode: rw
      logical_id:  vgganeti/4e5a0a89-ef64-4f3b-951a-1c1712f50b7b.disk0
      on primary:  not active
# gnt-instance startup wonderfulvm
Waiting for job 110241 for wonderfulvm...
Fri Apr 13 15:48:31 2012  - WARNING: Could not prepare block device disk/0 on 
node alfa.dom0.nautile.net (is_primary=False, pass=1): Error while assembling 
disk: Can't activate lv 
/dev/vgganeti/4e5a0a89-ef64-4f3b-951a-1c1712f50b7b.disk0:   One or more 
specified logical volume(s) not found.\n
Fri Apr 13 15:48:33 2012  - WARNING: Could not prepare block device disk/0 on 
node alfa.dom0.nautile.net (is_primary=True, pass=2): Error while assembling 
disk: Can't activate lv 
/dev/vgganeti/4e5a0a89-ef64-4f3b-951a-1c1712f50b7b.disk0:   One or more 
specified logical volume(s) not found.\n
Fri Apr 13 15:48:34 2012       Hint: If the message above refers to a secondary 
node, you can retry the operation using '--force'.
Job 110241 for wonderfulvm has failed: Failure: command execution error:
Disk consistency error

Ganeti should have renamed a42375e5-532d-4cb0-9252-a266bc67d3be.disk0_data back 
to 4e5a0a89-ef64-4f3b-951a-1c1712f50b7b.disk0 to return successfully to the 
previous state.

Thanks.

Original issue reported on code.google.com by gregoire.dlg@gmail.com on 13 Apr 2012 at 5:07

GoogleCodeExporter commented 9 years ago

Original comment by ius...@google.com on 19 Jul 2012 at 2:59

GoogleCodeExporter commented 9 years ago

Original comment by ultrot...@google.com on 8 Apr 2013 at 2:26

GoogleCodeExporter commented 9 years ago

Original comment by ultrot...@google.com on 8 Apr 2013 at 2:26

GoogleCodeExporter commented 9 years ago

Original comment by ultrot...@google.com on 10 Apr 2013 at 2:56

GoogleCodeExporter commented 9 years ago

Original comment by aeh...@google.com on 22 Apr 2013 at 1:07

GoogleCodeExporter commented 9 years ago
commit f38270c6da11811d432db51061a32d3b85823372
Author: Klaus Aehlig <aehlig@google.com>
Date:   Tue Apr 23 11:35:18 2013 +0200

    In plain to drbd conversion, rename LVs back on failure

    Currently, if converting an instance from plain to drbd fails after
    renaming the original LVs, the instance is left in an inconsistent
    state. This commit tries to undo the renaming if a failure occurs
    on assembling a DRBD device, e.g., when device minor numbers are
    exhausted. (Issue 229)

    Signed-off-by: Klaus Aehlig <aehlig@google.com>
    Reviewed-by: Thomas Thrainer <thomasth@google.com>

Original comment by aeh...@google.com on 26 Apr 2013 at 1:13

GoogleCodeExporter commented 9 years ago
The actual issue should be fixed, but QA is missing.

Original comment by aeh...@google.com on 26 Apr 2013 at 1:18

GoogleCodeExporter commented 9 years ago
how to fix it in 2.5.2 (installed on fedora linux, 2.5.2-1.fc18) ?
update to 2.7 fails because ganeti need python-2.6, but python-2.7 installed in 
system.

Original comment by lesov...@gmail.com on 29 Jul 2013 at 5:16

GoogleCodeExporter commented 9 years ago
Python 2.7 should work perfectly well for Ganeti, how does the upgrade to 2.7 
fail for that reason? Our source version is also compatible with 2.7, are you 
sure you're not using some packages provided that only support 2.6?

Thanks,

Guido

Original comment by ultrot...@google.com on 29 Jul 2013 at 12:46

GoogleCodeExporter commented 9 years ago
I use "integ-ganeti" repository that located on 
http://jfut.integ.jp/linux/ganeti

As I know, this packages adapted for CentOS and not for Fedora.

Original comment by lesov...@gmail.com on 31 Jul 2013 at 7:16

GoogleCodeExporter commented 9 years ago
Then can you please compile from source, or use a fedora packaging (if any) or 
make one? We can't backport this fix to 2.5, unfortunately.

Alternatively you can have a look at the patch 
(f38270c6da11811d432db51061a32d3b85823372) and see if it applies to 2.5 and if 
you can add it to your installation.

Original comment by ultrot...@google.com on 31 Jul 2013 at 1:35

GoogleCodeExporter commented 9 years ago
Repo maintainer add packages for Fedora. In next days I will planning ganeti 
uprgade.

Original comment by lesov...@gmail.com on 4 Aug 2013 at 6:22

GoogleCodeExporter commented 9 years ago
Since the actual issue is fixed and only QA is missing,
this is no longer a defect in the strict sense.

Original comment by aeh...@google.com on 29 Aug 2013 at 11:17

GoogleCodeExporter commented 9 years ago
Move non-critical bugs scheduled for 2.8 or 2.9 to 2.11, as in those versions 
only critical bug fixes will be integrated.

Original comment by thoma...@google.com on 30 Oct 2013 at 9:48