Closed ReggieCarey closed 1 year ago
@ReggieCarey: This issue is currently awaiting triage.
If Metal3.io contributors determine this is a relevant issue, they will accept it by applying the triage/accepted
label and provide further guidance.
The triage/accepted
label can be added by org members by writing /triage accepted
in a comment.
/cc @Rozzii
/assign @Rozzii
How it works is basically that the BareMetalHost containing the BMC details + credentials is your hardware inventory. That's managed by the baremetal-operator. Then you have the Cluster API, which selects hosts from that inventory and provisions them with an image (by setting that field in the BareMetalHost) that allows them to form a cluster. That's managed by the cluster-api-provider-metal3. (It's confusing that different parts of the BareMetalHost API are written by different actors, and one day if we ever make a new API version we will fix that.) This is the part that sounds like it's not happening.
When you provision a BareMetalHost (as you successfully did manually by adding the image to the spec), the final state is 'provisioned'. So that seems to be working as expected.
For a Cluster API Machine, the final state is Running and it occurs once a Node has come up and can be linked to the Machine. But it sounds like you're not even getting as far as the Machines provisioning. The Machine is selecting a Host (consumerRef is set), but not provisioning it. That's likely due to some problem with the state of the BMH, but logs from the CAPM3 would help in figuring out what. (Note that hosts you manually provisioned by setting the image url would not be selected by Machines for provisioning, so that's why consumerRefs didn't show up in that case.)
BMO only reads its Config once at startup, so yes you will have to restart it to pick up any config changes. Deleting BareMetalHosts works, but note that they get deprovisioned first (if they were provisioned or provisioning), so it may take a long time.
@ReggieCarey can we close this ticket, or is it still relevant for you? Have zane's answer helped ?
Please close this ticketSent from my iPhoneOn Feb 1, 2023, at 09:21, Adam Rozman @.***> wrote: @ReggieCarey can we close this ticket, or is it still relevant for you? Have zane's answer helped ?
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>
Thanks for the feedback @ReggieCarey
/close
@Rozzii: Closing this issue.
HELP Request I can not pin point a tool that is leading to the failures described
What steps did you take and what happened: [A clear and concise description on how to REPRODUCE the bug.]
Setup: My TestBed:
4 servers with 64 cpu's each and tonnes of RAM. All visible on a IPMI network. A baremetal network on its own vlan A provisioning network on its own vlan One server set up as the provisioning server Installed K8s 1.24.6 by hand - using Calico CNI on the provisioning server Installed Podman and deployed ironic via bare metal-operator/tools/run_local_ironic.sh on the provisioning server Installed ClusterAPI on K8S and included metal3 for the infrastructure implementation per ClusterAPI and Metal3 docs.
Created BMH artifacts for the remaining 3 machines - why where these not discovered automatically? (Canonical MAAS does this easily) Created and deployed Cluster artifacts via the clusterctl create command.
Two scenarios - two failure modes:
1) With image data in the BMH Decided TO include image: parameters ignoring entries in Metal3MachineTemplate. The state of the BMH never proceeds beyond provisioning or provisioned but my target OS is installed. No consumerRef is ever assigned to a BMH and the system either stops at the provisioned state or provisioning state. I see evidence of the target OS being placed on the host (ubuntu login prompt on console) but no other activity.
2) Without image data in the BMH Decided to NOT include image: parameters in favor of the entries in Metal3MachineTemplate. The state of the BMH never proceeds beyond available and my target OS is never installed.
A consumerRef is created but nothing comes of it. Further, the ref does not respect my BMH preferences established via labels in the BMH and referenced in Metal3MachineTemplate.
What did you expect to happen:
I expected either to have BMH generated automatically for me, or for the BMH I registered to be utilized in the stand up of a target cluster. Further, I expected my OS to be provisioned, and Kubeadm, Kubelet, etc. to be deployed and a cluster stood up.
Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]
There is a distinct lack of information about what's going on to help troubleshoot this. Log data is notoriously difficult to locate and the documentation stops well before describing a usage scenario beyond the development environment. I note that BMO seems to ignore changes to ConfigMap and removal of BMH entries. I have to kill the pod to get a re-read of the CMs.
I've monitored logs in the podman pod hosting ironic (dnsmasq, ironic, httpd, ironic-inspector, etc.) and I can tell that a target OS is never requested if image: is not in the BMH even though the RAMDISK image (CentOS Series 9) is provisioned.
Feature Request/Enhancement: Documentation These tools ( $Metal^3$ and the Cluster-API ) are great ideas and a lot of effort has gone into them. BUT: There is a lack of good documentation to describe the interactions and dependencies and requirements between the various API's, a network and K8s, working with REAL baremetal machines (I would write it but its a mystery to me).
Environment:
The best I can offer is :
I'm attempting provisioning a production K8s cluster on BareMetal using ClusterAPI and Metal3 with Ironic. As such, I'm not using metal3-dev-env. Provisioning Host running Ubuntu 20.04. Target machines get booted via IPMI to the provisioning RAMDISK (Centos Series 9). Depending on how BMH api is used, the target machines MAY boot an ubuntu image as created via
but this apparently only works once without restarting BMO/Ironic.
Ironic is running in a podman pod running colocated with a single node baremetal deployment of K8s on a provisioning host. Instructions for deploying and configuring Ironic inside of K8s are lacking.
/kind bug