termie / nova-migration-demo

Nova is a cloud computing fabric controller (the main part of an IaaS system). It is written in Python.
http://openstack.org/projects/compute/
Apache License 2.0
2 stars 0 forks source link

vgcreate/lvcreate in volume/service.py fail and go undetected #617

Closed termie closed 13 years ago

termie commented 13 years ago

I noticed that service.py under nova/volume contains this flag:

flags.DEFINE_string('storage_dev', '/dev/sdb', 'Physical device to use for volumes')

my host does not have a /dev/sdb, so vgcreate in _init_volume_group(self) fails and consequently lvcreate in _create_lv fails too. It seems that no exeption is reported in the log file. Shouldn't storage_dev be /dev/loop0 or any other free loop device chosen by losetup passed by $NOVA_VOLUME_ARGS?

Thanks, Armando


Imported from Launchpad using lp2gh.

termie commented 13 years ago

(by vishvananda) You can pass in a flag for a different storage device --> ./nova-volume --nodaemon --verbose --storage_dev=/dev/loop0 or in /etc/nova/nova-volume.conf: --storage_dev=/dev/loop0

Also, if you just make sure that the volume group already exists, it will work. The group should be called 'nova-volumes'. You can also specify a different volume group name with a flag --volume_group=vgfoo

Vish

On Wed, Aug 18, 2010 at 11:46 AM, Armando Migliaccio < 620027@bugs.launchpad.net> wrote:

\ Description changed:

I noticed that service.py under nova/volume contains this flag:

flags.DEFINE_string('storage_dev', '/dev/sdb', 'Physical device to use for volumes')

my host does not have a /dev/sdb, so vgcreate in _init_volume_group(self) fails and consequently lvcreate in _create_lv fails too. It seems that no exeption is reported in the log file.

  • Shouldn't storage_dev be /dev/loop0?
  • Shouldn't storage_dev be /dev/loop0 or any other free loop device chosen
  • by losetup?

    Thanks, Armando

\ Description changed:

I noticed that service.py under nova/volume contains this flag:

flags.DEFINE_string('storage_dev', '/dev/sdb', 'Physical device to use for volumes')

my host does not have a /dev/sdb, so vgcreate in _init_volume_group(self) fails and consequently lvcreate in _create_lv fails too. It seems that no exeption is reported in the log file. Shouldn't storage_dev be /dev/loop0 or any other free loop device chosen

  • by losetup?
  • by losetup passed by $NOVA_VOLUME_ARGS?

    Thanks, Armando

vgcreate/lvcreate in volume/service.py fail and go undetected https://bugs.launchpad.net/bugs/620027 You received this bug notification because you are a member of Nova Bugs, which is subscribed to OpenStack Compute (nova).

Status in OpenStack Compute (Nova): New

Bug description: I noticed that service.py under nova/volume contains this flag:

flags.DEFINE_string('storage_dev', '/dev/sdb', 'Physical device to use for volumes')

my host does not have a /dev/sdb, so vgcreate in _init_volume_group(self) fails and consequently lvcreate in _create_lv fails too. It seems that no exeption is reported in the log file. Shouldn't storage_dev be /dev/loop0 or any other free loop device chosen by losetup passed by $NOVA_VOLUME_ARGS?

Thanks, Armando

termie commented 13 years ago

(by armando-migliaccio) you are right, if you make sure that the volume group already exists, the volume creation does work. However, I went back to my config and saw that I do pass the flag storage_dev=/dev/loop0 on the command line. I also noticed that I do not pass the --nodaemon switch.

When I launch nova-volume without --nodaemon 'vgcreate' does not seem to get called. Is that possible? Instead when I launch nova-volume with the --nodaemon, vgcreate does get called. I did instrument the code and compared the log output in the two cases (latest lines of the logs are interesting):

\ NOVA-VOLUME WITHOUT --NODAEMON SWITCH ** Starting Nova Volume DEBUG:root:Full set of FLAGS: DEBUG:root:help : None DEBUG:root:storage_availability_zone : nova DEBUG:root:volume_topic : volume DEBUG:root:verbose : True DEBUG:root:encrypted : None DEBUG:root:compute_topic : compute DEBUG:root:default_kernel : aki-11111 DEBUG:root:report_profile : None DEBUG:root:rabbit_password : guest DEBUG:root:syslog : None DEBUG:root:prefix : nova-volume DEBUG:root:vpn_key_suffix : -key DEBUG:root:ec2_url : http://localhost:8773/services/Cloud DEBUG:root:originalname : None DEBUG:root:rundir : . DEBUG:root:profiler : hotshot DEBUG:root:uid : None DEBUG:root:connection_type : libvirt DEBUG:root:fake_rabbit : False DEBUG:root:s3_port : 3333 DEBUG:root:help_reactors : None DEBUG:root:rabbit_host : 10.70.177.14 DEBUG:root:source : None DEBUG:root:process_pool_size : 4 DEBUG:root:umask : None DEBUG:root:nothotshot : None DEBUG:root:debug : False DEBUG:root:fake_storage : False DEBUG:root:redis_db : 0 DEBUG:root:gid : None DEBUG:root:volume_group : nova-volumes DEBUG:root:reactor : None DEBUG:root:pidfile : /home/openstack/openstack/nova-volume.pid DEBUG:root:savestats : None DEBUG:root:rabbit_userid : guest DEBUG:root:storage_dev : /dev/loop0 DEBUG:root:file : twistd.tap DEBUG:root:default_instance_type : m1.small DEBUG:root:report_interval : 10 DEBUG:root:blades_per_shelf : 16 DEBUG:root:node_availability_zone : nova DEBUG:root:version : None DEBUG:root:aoe_eth_dev : eth0 DEBUG:root:auth_token_ttl : 3600 DEBUG:root:rabbit_port : 5672 DEBUG:root:chroot : None DEBUG:root:profile : None DEBUG:root:euid : None DEBUG:root:vpn_image_id : ami-CLOUDPIPE DEBUG:root:logfile : nova-volume.log DEBUG:root:nodaemon : None DEBUG:root:b : None DEBUG:root:last_shelf_id : 149 DEBUG:root:no_save : True DEBUG:root:aoe_export_dir : /var/lib/vblade-persist/vblades DEBUG:root:rabbit_virtual_host : / DEBUG:root:node_name : phantom DEBUG:root:redis_host : 127.0.0.1 DEBUG:root:spew : None DEBUG:root:r : None DEBUG:root:default_image : ami-11111 DEBUG:root:control_exchange : nova DEBUG:root:default_ramdisk : ari-11111 DEBUG:root:redis_port : 6379 DEBUG:root:s3_host : 127.0.0.1 DEBUG:root:python : /home/openstack/openstack/nova/trunk/bin/nova-volume DEBUG:root:first_shelf_id : 140 DEBUG:root:fake_network : False DEBUG:root:network_topic : network WARNING:root:Starting volume node DEBUG:root:* before_pvcreate * DEBUG:root:Executing: sudo ['pvcreate', '/dev/loop0']: DEBUG:root:>> execute DEBUG:root:<< execute DEBUG:root:exe output: <Deferred at 0xa700dec current result: <Deferred at 0xa700e2c>>

The last few lines are logging messages I added in _init_volume_group, simple_execute and execute. If I execute vgdislay on the shell I get nothing.

\ NOVA-VOLUME WITH --NODAEMON SWITCH **

Starting Nova Volume DEBUG:root:Full set of FLAGS: DEBUG:root:help : None DEBUG:root:storage_availability_zone : nova DEBUG:root:volume_topic : volume DEBUG:root:verbose : True DEBUG:root:encrypted : None DEBUG:root:compute_topic : compute DEBUG:root:default_kernel : aki-11111 DEBUG:root:report_profile : None DEBUG:root:rabbit_password : guest DEBUG:root:syslog : None DEBUG:root:prefix : nova-volume DEBUG:root:vpn_key_suffix : -key DEBUG:root:ec2_url : http://localhost:8773/services/Cloud DEBUG:root:originalname : None DEBUG:root:rundir : . DEBUG:root:profiler : hotshot DEBUG:root:uid : None DEBUG:root:connection_type : libvirt DEBUG:root:fake_rabbit : False DEBUG:root:s3_port : 3333 DEBUG:root:help_reactors : None DEBUG:root:rabbit_host : 127.0.0.1 DEBUG:root:source : None DEBUG:root:process_pool_size : 4 DEBUG:root:umask : None DEBUG:root:nothotshot : None DEBUG:root:debug : False DEBUG:root:fake_storage : False DEBUG:root:redis_db : 0 DEBUG:root:gid : None DEBUG:root:volume_group : nova-volumes DEBUG:root:reactor : None DEBUG:root:pidfile : /home/openstack/openstack/nova-volume.pid DEBUG:root:savestats : None DEBUG:root:rabbit_userid : guest DEBUG:root:storage_dev : /dev/loop0 DEBUG:root:file : twistd.tap DEBUG:root:default_instance_type : m1.small DEBUG:root:report_interval : 10 DEBUG:root:blades_per_shelf : 16 DEBUG:root:node_availability_zone : nova DEBUG:root:version : None DEBUG:root:aoe_eth_dev : eth0 DEBUG:root:auth_token_ttl : 3600 DEBUG:root:rabbit_port : 5672 DEBUG:root:chroot : None DEBUG:root:profile : None DEBUG:root:euid : None DEBUG:root:vpn_image_id : ami-CLOUDPIPE DEBUG:root:logfile : - DEBUG:root:nodaemon : True DEBUG:root:b : None DEBUG:root:last_shelf_id : 149 DEBUG:root:no_save : True DEBUG:root:aoe_export_dir : /var/lib/vblade-persist/vblades DEBUG:root:rabbit_virtual_host : / DEBUG:root:node_name : phantom DEBUG:root:redis_host : 127.0.0.1 DEBUG:root:spew : None DEBUG:root:r : None DEBUG:root:default_image : ami-11111 DEBUG:root:control_exchange : nova DEBUG:root:default_ramdisk : ari-11111 DEBUG:root:redis_port : 6379 DEBUG:root:s3_host : 10.70.177.40 DEBUG:root:python : /home/openstack/openstack/nova/trunk/bin/nova-volume DEBUG:root:first_shelf_id : 140 DEBUG:root:fake_network : False DEBUG:root:network_topic : network WARNING:root:Starting volume node DEBUG:root:* before_pvcreate * DEBUG:root:Executing: sudo ['pvcreate', '/dev/loop0']: DEBUG:root:>> execute DEBUG:root:<< execute DEBUG:root:exe output: <Deferred at 0x8f89dec current result: <Deferred at 0x8f89e2c>> 2010-08-19 10:17:06+0100 [-] Log opened. 2010-08-19 10:17:06+0100 [-] twistd 10.0.0 (/usr/bin/python 2.6.5) starting up. 2010-08-19 10:17:06+0100 [-] reactor class: twisted.internet.selectreactor.SelectReactor. DEBUG:root:* after_pvcreate * 2010-08-19 10:17:06+0100 -: DEBUG * after_pvcreate * DEBUG:root:Executing: sudo ['vgcreate', 'nova-volumes', '/dev/loop0']: 2010-08-19 10:17:06+0100 -: DEBUG Executing: sudo ['vgcreate', 'nova-volumes', '/dev/loop0']: DEBUG:root:>> execute 2010-08-19 10:17:06+0100 -: DEBUG >> execute DEBUG:root:<< execute 2010-08-19 10:17:06+0100 -: DEBUG << execute DEBUG:root:exe output: <Deferred at 0x8fa608c current result: <Deferred at 0x8fa660c>> 2010-08-19 10:17:06+0100 -: DEBUG exe output: <Deferred at 0x8fa608c current result: <Deferred at 0x8fa660c>> DEBUG:root:*after vgcreate * 2010-08-19 10:17:06+0100 -: DEBUG *after vgcreate *

vgdisplay says:

--- Volume group --- VG Name nova-volumes System ID Format lvm2 Metadata Areas 1 Metadata Sequence No 1 VG Access read/write VG Status resizable MAX LV 0 Cur LV 0 Open LV 0 Max PV 0 Cur PV 1 Act PV 1 VG Size 10.00 GiB PE Size 4.00 MiB Total PE 2559 Alloc PE / Size 0 / 0 Free PE / Size 2559 / 10.00 GiB VG UUID xiORYR-dJ0P-pu2e-QiDx-nstm-PwWh-7HuJHZ

Hope this help

termie commented 13 years ago

(by justin-fathomdb) pvcreate is immediately followed by vgcreate in the code (though both are deferreds). So if pvcreate is being called but vgcreate is not, then it sounds like pvcreate is raising an exception.

I would feel that any errors should be logged, and indeed, looking at the code, I don't see how an error is not being logged. Does anything get printed on stdout/stderr by nova-volume (particularly in the --nodaemon case?)

You might try merging my branch which checks the results of spawned processes: bzr merge lp:~justin-fathomdb/nova/check-subprocess-exit-code

I don't think that will give you a better error message (though it might!), but what it will do is not treat messages on stderr as being failures. For instance (speculating), perhaps the first time you call pvcreate it loads a kernel module, which prints a message on stderr, which causes a failure the first time you run it (only).

Unfortunately, it looks like our twisted module calls into twistd.runApp, which appears to be an undocumented twisted function (http://twistedmatrix.com/trac/wiki/UndocumentedScripts). Any Twisted people able to comment on why no error is being logged when exceptions are thrown at startup?

termie commented 13 years ago

(by armando-migliaccio) in the --nodaemon case nova-volume works like a charm, it creates both the physical volume and the volume group. It's the case without the --nodaemon switch that has troubles...the difference between the two pvcreate outputs:

* WITH --nodaemon switch *

--- Physical volume --- PV Name /dev/loop0 VG Name nova-volumes PV Size 10.00 GiB / not usable 4.00 MiB Allocatable yes PE Size 4.00 MiB Total PE 2559 Free PE 2559 Allocated PE 0 PV UUID 5AKxZf-9TlO-3CSF-8UoG-GHzF-v910-CqIsEg

* WITHOUT --nodaemon switch * "/dev/loop0" is a new physical volume of "10.00 GiB" --- NEW Physical volume --- PV Name /dev/loop0 VG Name PV Size 10.00 GiB Allocatable NO PE Size 0 Total PE 0 Free PE 0 Allocated PE 0 PV UUID FWH78L-I3b2-eMS6-hE43-V7dp-t1Fb-ksNDTI

It looks like that pvcreate fails silently...but I don't see any messages in /var/log/syslog or /var/log/messages. Any clues?

termie commented 13 years ago

(by justin-fathomdb) Other than the earlier suggestions (merge in the error checking branch)...

What perplexes me is that it looks like pvcreate is succeeding in both cases, because a PV appears to be created.

If it's not dependent on the order in which you run the commands, then perhaps it's a permissions problem? (I wonder if the sudo is causing trouble). Who are you running this as? Can you try running as root (e.g. sudo bash beforehand).

Also, I noticed you seem to have uid and gid flags... do you by any chance have ~soren/nova/derootification merged in? Are you running off a clean trunk?

termie commented 13 years ago

(by armando-migliaccio) I am running off a clean trunk (the latest) and as root. I also tried to merge ~justin-fathomdb/nova/check-subprocess-exit-code and ~justin-fathomdb/nova/check-subprocess-exit-code but I haven't got any better error messages. I still experience the same issue, which is the volume group is not created at initialization, if nova-volume runs as daemon.

if the physical volume/volume group already exist, pvcreate and vgcreate both fail with exit code 5 (I see that if I launch the commands from the shell), this means that also nova-volume incur in this error, however the log does not trace any of that.

termie commented 13 years ago

(by justin-fathomdb) Inspired by that exit code 5 comment, would it be correct to rephrase the bug as "if the PV / VG already exists, the storage manager fails to launch"? Or are you deleting the PV/VG in between runs?

The "if it already exists" bug probably exists, even if it isn't your problem here.

What I may do is fix the "if it already exists" bug, and add more logging of the stdout/stderr/exit code in case of problems (which I'll have to look at anyway as part of the bug fix.)

But do please let me know whether you're cleaning up the PV/VG in between (sounds like you probably are...)

termie commented 13 years ago

(by armando-migliaccio) I am deleting PV/VG in between runs.

self.stderr.write() and self.stdout.write() do not write on my console in either mode (daemon, nodaemon) so I replaced them with log traces and I managed to see that when nova-volume runs in daemon mode vgcreate does not get called at all! Is it possible that the process.simple_execute call gets lost somehow in the call chain?

termie commented 13 years ago

(by armando-migliaccio) by the way, with your changes I can see in the log that the pvcreate call fails in the "if it already exists" case, but that (as you pointed out) is not the problem here. It's just this oddity about the daemon mode, which might potentially affect every other service!

termie commented 13 years ago

(by vishvananda) It would be great to figure out why this is happening but ultimately It might be better if nova volume didn't go around creating volume groups at all and just checked to make sure the right one exists.

On Aug 19, 2010 9:35 AM, "Armando Migliaccio" 620027@bugs.launchpad.net wrote:

by the way, with your changes I can see in the log that the pvcreate call fails in the "if it already exists" case, but that (as you pointed out) is not the problem here. It's just this oddity about the daemon mode, which might potentially affect every other service!

vgcreate/lvcreate in volume/service.py fail and go undetected https://bugs.launchpad.net/bugs/620027 You received this bug notification because you are a member of Nova Bugs, which is subscribed to OpenStack Compute (nova).

Status in OpenStack Compute (Nova): New

Bug description: I noticed that service.py under nova/volume contains this flag:

flags.DEFINE_string('storage_dev', '/dev/sdb', 'Physical device to use for volumes')

my host does not have a /dev/sdb, so vgcreate in _init_volume_group(self) fails and consequently lvcreate in _create_lv fails too. It seems that no exeption is reported in the log file. Shouldn't storage_dev be /dev/loop0 or any other free loop device chosen by losetup passed by $NOVA_VOLUME_ARGS?

Thanks, Armando

termie commented 13 years ago

(by vishvananda) This may be an unrelated problem, but there is a particularly nasty issue with LVM interacting with AoE. The various LV commands will hang trying to stat orphaned aoe devices. And they hang badly in a system call and can't be killed. The best solution is to add something to LVM config so it doesn't try to stat the aoe devices. My Filter looks like so in /etc/lvm/lvm.conf:

filter = [ "r|/dev/etherd/.*|", "r|/dev/block/.*|", "a/.*/" ]

On Thu, Aug 19, 2010 at 9:29 AM, Armando Migliaccio < 620027@bugs.launchpad.net> wrote:

by the way, with your changes I can see in the log that the pvcreate call fails in the "if it already exists" case, but that (as you pointed out) is not the problem here. It's just this oddity about the daemon mode, which might potentially affect every other service!

vgcreate/lvcreate in volume/service.py fail and go undetected https://bugs.launchpad.net/bugs/620027 You received this bug notification because you are a member of Nova Bugs, which is subscribed to OpenStack Compute (nova).

Status in OpenStack Compute (Nova): New

Bug description: I noticed that service.py under nova/volume contains this flag:

flags.DEFINE_string('storage_dev', '/dev/sdb', 'Physical device to use for volumes')

my host does not have a /dev/sdb, so vgcreate in _init_volume_group(self) fails and consequently lvcreate in _create_lv fails too. It seems that no exeption is reported in the log file. Shouldn't storage_dev be /dev/loop0 or any other free loop device chosen by losetup passed by $NOVA_VOLUME_ARGS?

Thanks, Armando

termie commented 13 years ago

(by armando-migliaccio) I commented out the pvcreate command under _init_volume_group

def _init_volume_group(self): if FLAGS.fake_storage: return

yield process.simple_execute(

    #        "sudo pvcreate %s" % (FLAGS.storage_dev))
    yield process.simple_execute(
            "sudo vgcreate %s %s" % (FLAGS.volume_group,
                                     FLAGS.storage_dev))

and let nova-volume create the group and the physical disk in one go. The command's output looks like below:

No physical volume label read from /dev/loop0 Physical volume "/dev/loop0" successfully created Volume group "nova-volumes" successfully created

when I run nova-volume as daemon I finally get the volume group created!! I know it does not sound like a bug fix, but commenting pvcreate out does circumvent the problem.

If vgcreate takes care of the "pvcreation" too, would it make sense to have just one simple_execute call?

termie commented 13 years ago

(by jaypipes) Ping on this bug. Where are we with this. Vish, Justin, has anything been fixed in this regard? Is the bug valid? Trying to do a little maintenance on outstanding bugs...thanks.

termie commented 13 years ago

(by armando-migliaccio) After the eventlet merge and the latest developments on the nova branch, I think this bug report no longer applies