olopez32 / ganeti

Automatically exported from code.google.com/p/ganeti
0 stars 0 forks source link

When the metavg is wrong instance creation fails in a non-nice way. #416

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
2013-04-06 23:53:48,331: ganeti-masterd pid=3635/Jq3/Job1003 ERROR Op 1/1: 
Caught exception in INSTANCE_CREATE(infra1)
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/ganeti/jqueue.py", line 1031, in _ExecOpCodeUnlocked
    timeout=timeout, priority=op.priority)
  File "/usr/local/lib/python2.7/dist-packages/ganeti/jqueue.py", line 1342, in _WrapExecOpCode
    return execop_fn(op, *args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/ganeti/mcpu.py", line 457, in ExecOpCode
    priority)
  File "/usr/local/lib/python2.7/dist-packages/ganeti/mcpu.py", line 396, in _LockAndExecLU
    result = self._LockAndExecLU(lu, level + 1, calc_timeout, priority)
  File "/usr/local/lib/python2.7/dist-packages/ganeti/mcpu.py", line 405, in _LockAndExecLU
    result = self._LockAndExecLU(lu, level + 1, calc_timeout, priority)
  File "/usr/local/lib/python2.7/dist-packages/ganeti/mcpu.py", line 396, in _LockAndExecLU
    result = self._LockAndExecLU(lu, level + 1, calc_timeout, priority)
  File "/usr/local/lib/python2.7/dist-packages/ganeti/mcpu.py", line 396, in _LockAndExecLU
    result = self._LockAndExecLU(lu, level + 1, calc_timeout, priority)
  File "/usr/local/lib/python2.7/dist-packages/ganeti/mcpu.py", line 356, in _LockAndExecLU
    result = self._ExecLU(lu)
  File "/usr/local/lib/python2.7/dist-packages/ganeti/mcpu.py", line 328, in _ExecLU
    result = _ProcessResult(submit_mj_fn, lu.op, lu.Exec(self.Log))
  File "/usr/local/lib/python2.7/dist-packages/ganeti/cmdlib.py", line 10122, in Exec
    _CreateDisks(self, iobj)
  File "/usr/local/lib/python2.7/dist-packages/ganeti/cmdlib.py", line 9013, in _CreateDisks
    _CreateBlockDev(lu, node, instance, device, f_create, info, f_create)
  File "/usr/local/lib/python2.7/dist-packages/ganeti/cmdlib.py", line 8649, in _CreateBlockDev
    force_open)
  File "/usr/local/lib/python2.7/dist-packages/ganeti/cmdlib.py", line 8688, in _CreateBlockDevInner
    info, force_open)
  File "/usr/local/lib/python2.7/dist-packages/ganeti/cmdlib.py", line 8693, in _CreateBlockDevInner
    _CreateSingleBlockDev(lu, node, instance, device, info, force_open)
  File "/usr/local/lib/python2.7/dist-packages/ganeti/cmdlib.py", line 8721, in _CreateSingleBlockDev
    " node %s for instance %s" % (device, node, instance.name))
  File "/usr/local/lib/python2.7/dist-packages/ganeti/rpc.py", line 239, in Raise
    raise ec(*args) # pylint: disable=W0142
OpExecError: Can't create block device 
<LogicalVolume(/dev/xenvg/21baa0ac-fec3-4183-8f95-be0a09473924.disk0_meta, not 
visible, size=128m)> on node brick02.domain.tld for instance infra1.domain.tld: 
Can't create block device: Can't compute PV info for vg xenvg

Also it'd be nice if cluster verify complained about the metavg, and if the 
metavg could just default to the normal vg (eg. via a special value)

Original issue reported on code.google.com by ultrot...@google.com on 8 Apr 2013 at 10:26

GoogleCodeExporter commented 9 years ago

Original comment by ultrot...@google.com on 10 Apr 2013 at 2:50

GoogleCodeExporter commented 9 years ago

Original comment by ultrot...@google.com on 10 Apr 2013 at 2:56

GoogleCodeExporter commented 9 years ago
Also it might be nice to have a metavg value that means "same as the vg". (or 
no metavg specified==same as vg, and none is specified by default?)

Original comment by ultrot...@google.com on 11 Apr 2013 at 3:29

GoogleCodeExporter commented 9 years ago
An example of command triggering the issue (when the metavg is already wrong) 
is:
  gnt-instance add -t drbd --disk 0:size=1G -I hail -o debian-image mtartara-gnt0.example.com

Which fails with the following error message:

Mon Apr 15 12:31:07 2013  - INFO: Selected nodes for instance 
mtartara-gnt0.example.com via iallocator hail: node3.example.com, 
node2.example.com
Mon Apr 15 12:31:08 2013 * creating instance disks...
Mon Apr 15 12:31:08 2013  - WARNING: Device creation failed
Failure: command execution error:
Can't create block device 
<LogicalVolume(/dev/wrongvg/2f7d33d3-5e8d-4e69-b3e7-45c0a530aab8.disk0_meta, 
not visible, size=128m)> on node node2.example.com for instance 
mtartara-gnt0.example.com: Can't create block device: Can't compute PV info for 
vg wrongvg

And it also leaves the cluster in an inconsistent state. In fact, the output of 
gnt-cluster verify after trying to add the instance is:
Submitted jobs 20, 21
Waiting for job 20 ...
Mon Apr 15 12:31:59 2013 * Verifying cluster config
Mon Apr 15 12:31:59 2013 * Verifying cluster certificate files
Mon Apr 15 12:31:59 2013 * Verifying hypervisor parameters
Mon Apr 15 12:31:59 2013 * Verifying all nodes belong to an existing group
Waiting for job 21 ...
Mon Apr 15 12:31:59 2013 * Verifying group 'default'
Mon Apr 15 12:31:59 2013 * Gathering data (3 nodes)
Mon Apr 15 12:32:02 2013 * Gathering disk information (3 nodes)
Mon Apr 15 12:32:02 2013 * Verifying configuration file consistency
Mon Apr 15 12:32:02 2013 * Verifying node status
Mon Apr 15 12:32:02 2013 * Verifying instance status
Mon Apr 15 12:32:02 2013 * Verifying orphan volumes
Mon Apr 15 12:32:02 2013   - ERROR: node node2.example.com: volume 
xenvg/2f7d33d3-5e8d-4e69-b3e7-45c0a530aab8.disk0_data is unknown
Mon Apr 15 12:32:02 2013 * Verifying N+1 Memory redundancy
Mon Apr 15 12:32:02 2013 * Other Notes
Mon Apr 15 12:32:02 2013 * Hooks Results

Original comment by mtart...@google.com on 15 Apr 2013 at 12:36

GoogleCodeExporter commented 9 years ago
Partially fixed by commit 9b221ea4e18d8d5c432de9559628c348f9ff9cc9, that 
prevents the cluster from ending up in an inconsistent state.

The OpExecError exception is still raised, and appears in masterd.log, which is 
not elegant.
To close the bug completely, the possibility of actually creating the disks 
according to the current "metavg" should be checked as a precondition, before 
executing the opcode, and a OpPrereqError should be raised it is turns out not 
to be possible.

Original comment by mtart...@google.com on 22 Apr 2013 at 7:52

GoogleCodeExporter commented 9 years ago

Original comment by ultrot...@google.com on 22 Apr 2013 at 8:18

GoogleCodeExporter commented 9 years ago
Move non-critical bugs scheduled for 2.8 or 2.9 to 2.11, as in those versions 
only critical bug fixes will be integrated.

Original comment by thoma...@google.com on 30 Oct 2013 at 9:48