redhat-openstack / tripleo-quickstart

Ansible roles for setting up TripleO virtual environments and building images
16 stars 15 forks source link

Collect more logs in CI #14

Closed trown closed 8 years ago

trown commented 8 years ago

We need to collect logs from the virthost itself for image building. We should probably not be depending on khaleesi for log collection and just fork what is there now into oooq.

apevec commented 8 years ago

@trown I need more logs outside CI too :) I'm stuck in manual installation at Introspect Nodes steps. After introspection, all virtual nodes report:

Preprocessing hook ramdisk_error: Ramdisk reported error: The following errors were encountered:
 * failed to run hardware-detect utility: [Errno 2] No such file or directory
apevec commented 8 years ago

ok, http://docs.openstack.org/developer/tripleo-docs/troubleshooting/troubleshooting-nodes.html has some hints, /var/log/ironic-inspector/ramdisk/bmc_* in undercloud node shows more details from ironic-python-agent, I see modprobe: ERROR: could not insert 'ipmi_si': No such device ? Here's one example: https://apevec.fedorapeople.org/openstack/bmc__2016.03.05_15.20.33_273251.tar.gz

trown commented 8 years ago

whoops. I think I know what is going on here. I think introspection is likely broken with the IPA ramdisk being built right now, and I didn't catch it because introspection got left out of the nonHA job by accident.

Fortunately, for virtual setups inspection is pointless. I will use this issue to track fixing the nonHA job to CI introspection again though. We want that to work for baremetal.

apevec commented 8 years ago

If inspection is not required for virt setup, how can I skip it? Answering to myself: marking nodes as available in Ironic allowed me to continue:

$ for uuid in 46e21da5-8ca0-46f9-b6a9-ed67b0b250c7 ef48d80d-7330-4ca4-aeb6-cda27a4ce6a4  f8ab9182-6c6b-4ecc-bbdd-fd2fc98bb048  0e4fcbcb-ce18-4c6a-98a6-ad5759543729 113b57ea-4abd-46b2-b8a2-54751e3e9e35 ; do ironic node-set-provision-state $uuid provide; done

But then overcloud deployment failed in ControllerOvercloudServicesDeployment_Step5 ... ...after RTFM http://hardysteven.blogspot.hr/2015/04/debugging-tripleo-heat-templates.html this means Keystone failed:

https://github.com/openstack/tripleo-heat-templates/blob/205ea09ca27caaeaddab27f3b021b398db195890/puppet/manifests/overcloud_controller_pacemaker.pp#L1839-L1851

heat deployment-show has bunch of Cannot allocate memory - fork(2) - I ended up with 4GB RAM control_0 VM, looks like that's not enough? @trown what size are controller nodes in CI ?

trown commented 8 years ago

For the HA job, controller nodes only have 4GB of RAM. Maybe that is not enough. How much RAM does the virthost have free?

As far as skipping inspection, just not running it at all works. Nodes start in available state. If you run inspection and it fails though, the nodes could be in manageable state, and need to be manually moved to available.

apevec commented 8 years ago

My virthost has 64GB RAM so that should be enough. I couldn't get to the virt nodes consoles so I've now redefined all vms and increased RAM to 8G and trying overcloud deploy again.

apevec commented 8 years ago

As far as skipping inspection, just not running it at all works

This should be added to upstream docs which are pointed to from RDO tripleo docs: http://docs.openstack.org/developer/tripleo-docs/basic_deployment/basic_deployment_cli.html#upload-images

apevec commented 8 years ago

overcloud has deployed successfully, overcloud-controller-0 shows 3.3G RAM used when idle. During deployment more RAM is definitely used, so 4GB virtual nodes are not safe enough.

trown commented 8 years ago

moved to https://bugs.launchpad.net/tripleo-quickstart/+bug/1571046