Closed trown closed 8 years ago
@trown I need more logs outside CI too :) I'm stuck in manual installation at Introspect Nodes steps. After introspection, all virtual nodes report:
Preprocessing hook ramdisk_error: Ramdisk reported error: The following errors were encountered:
* failed to run hardware-detect utility: [Errno 2] No such file or directory
ok, http://docs.openstack.org/developer/tripleo-docs/troubleshooting/troubleshooting-nodes.html has some hints, /var/log/ironic-inspector/ramdisk/bmc_* in undercloud node shows more details from ironic-python-agent, I see modprobe: ERROR: could not insert 'ipmi_si': No such device ? Here's one example: https://apevec.fedorapeople.org/openstack/bmc__2016.03.05_15.20.33_273251.tar.gz
whoops. I think I know what is going on here. I think introspection is likely broken with the IPA ramdisk being built right now, and I didn't catch it because introspection got left out of the nonHA job by accident.
Fortunately, for virtual setups inspection is pointless. I will use this issue to track fixing the nonHA job to CI introspection again though. We want that to work for baremetal.
If inspection is not required for virt setup, how can I skip it? Answering to myself: marking nodes as available in Ironic allowed me to continue:
$ for uuid in 46e21da5-8ca0-46f9-b6a9-ed67b0b250c7 ef48d80d-7330-4ca4-aeb6-cda27a4ce6a4 f8ab9182-6c6b-4ecc-bbdd-fd2fc98bb048 0e4fcbcb-ce18-4c6a-98a6-ad5759543729 113b57ea-4abd-46b2-b8a2-54751e3e9e35 ; do ironic node-set-provision-state $uuid provide; done
But then overcloud deployment failed in ControllerOvercloudServicesDeployment_Step5 ... ...after RTFM http://hardysteven.blogspot.hr/2015/04/debugging-tripleo-heat-templates.html this means Keystone failed:
heat deployment-show has bunch of Cannot allocate memory - fork(2) - I ended up with 4GB RAM control_0 VM, looks like that's not enough? @trown what size are controller nodes in CI ?
For the HA job, controller nodes only have 4GB of RAM. Maybe that is not enough. How much RAM does the virthost have free?
As far as skipping inspection, just not running it at all works. Nodes start in available state. If you run inspection and it fails though, the nodes could be in manageable state, and need to be manually moved to available.
My virthost has 64GB RAM so that should be enough. I couldn't get to the virt nodes consoles so I've now redefined all vms and increased RAM to 8G and trying overcloud deploy again.
As far as skipping inspection, just not running it at all works
This should be added to upstream docs which are pointed to from RDO tripleo docs: http://docs.openstack.org/developer/tripleo-docs/basic_deployment/basic_deployment_cli.html#upload-images
overcloud has deployed successfully, overcloud-controller-0 shows 3.3G RAM used when idle. During deployment more RAM is definitely used, so 4GB virtual nodes are not safe enough.
We need to collect logs from the virthost itself for image building. We should probably not be depending on khaleesi for log collection and just fork what is there now into oooq.