Need Clarification for Cluster Creation and Node Provisioning with Real Hardware

sumitjadhav1 commented 4 years ago

[1] We are currently testing the Metal3/BMO and Cluster Provisioning on Dell PowerEdge Servers. We have successfully completed till Node Introspection (BM Nodes in Ready state) step. Now next step is BM Node provisioning and Cluster Creation. We are using two approaches as follows :

First Node Provisioning using provision_host.sh script and create cluster using cluster scripts : -> Here BM Nodes state changes from Ready to Provisioned successfully (Workaround : We need to configure one boot device parameter in Ironic using CLI before starting Node Provisioning) -> After this we start with Cluster Scripts available under ~/metal3-dev-env/scripts/v1alphaX -> We're facing issue here that none of available BM Nodes is picked for Cluster initiation process (to be made Master node) -> Our doubt : Is this the right approach for Cluster Creation and Provisioning
BM Nodes in Ready state and use Cluster scripts : -> Here BM Nodes are in Ready state post successful introspection. We also apply Workaround as stated above in approach 1 for Ironic for all available Nodes. -> Now we use Cluster scripts for cluster creation and it picks one of the available nodes for Node Provisioning (to change from Ready to Provisioned state) -> This operation gets stuck at Ironic with error for which we have already applied the Workaround at the beginning. -> Our doubt : Node Provisioning works in approach-1 whereas here it fails with error at Ironic even though we have Workaround in place.

[2] Another doubt regarding Baremetal Network (192.168.111.0/24), which is used for vBMC nodes created in Metal3 setup. -> Do we need to change this IP/Range (Cluster API Endpoint is defined in this range) in lib/common.sh file to use iDRAC network available for Dell hardware. Observed below error in our logs :

"failed to create remote cluster client: failed to create client for workload cluster metal3/test1: Get https://192.168.111.249:6443/api?timeout=30s: dial tcp 192.168.111.249:6443: connect: no route to host"

-> Need help on this front.

jan-est commented 4 years ago

Cluster need to be provisioned before creating any machines. I am not sure does this fix your problem but workflow is:

create cluster
create control plane
create machines

Not sure are you using CAPM3 at all and which version if you do? But this is the way it is done in v1alpha3 when using CAPM3.

juliakreger commented 4 years ago

@sumitjadhav1 Regarding your first issue, could you clarify your operating configuration and what configuration your supplying, as well as the command your are reporting that you have to execute? Ideally you shouldn't have to do this, but reality seems to be that some BMCs, depending on the driver, protocol, and ultimately BMC firmware, are subtly requiring different approaches, and without fully understanding all aspects in terms of context, it is difficult for us to help. In other words, if we could get more information, it would help us understand.

maelk commented 4 years ago

you should not mix provision_host.sh script and CAPM3 based scripts. both are doing more or less the same thing under the hood (provisioning a node). If you want to use CAPI and CAPM3, use the scripts under ./scripts/v1alphaX only once your environment is ready (BMHs ready)
This is the right approach. Did you try to apply your workaround again ? maybe that is something you need to do every time ?

[2] There are two notions mixed here. The baremetal network is the network for your target cluster, i.e. what your nodes will be using for Kubernetes setup. The fact that vbmc instances were on this network is only a design decision. The constraint about vbmc nodes (and BMCs in general) is that they must be reachable from Ironic, but they do not have to be on the baremetal network. And you don't need to configure that BMC network anywhere. It will work as long as your traffic from Ironic to BMC is routed properly.

Now the error you are seeing is unrelated to the BMCs and Ironic. "failed to create remote cluster client: failed to create client for workload cluster metal3/test1: Get https://192.168.111.249:6443/api?timeout=30s: dial tcp 192.168.111.249:6443: connect: no route to host" means that CAPI or CAPM3 is trying to talk to your target cluster after provisioning. 192.168.111.249:6443 is the load-balancer port to reach the API server. If your provisioning was not successful, or your deployment failed, then that load balancer won't be reachable and you will get this error. But it is unrelated to BMCs.

sumitjadhav1 commented 4 years ago

@sumitjadhav1 Regarding your first issue, could you clarify your operating configuration and what configuration your supplying, as well as the command your are reporting that you have to execute? Ideally you shouldn't have to do this, but reality seems to be that some BMCs, depending on the driver, protocol, and ultimately BMC firmware, are subtly requiring different approaches, and without fully understanding all aspects in terms of context, it is difficult for us to help. In other words, if we could get more information, it would help us understand.

Hi @juliakreger ,

Configuration level : We have a provisioning network (192.168.121.x/24), BMC(iDRAC Network) network (192.168.110.x/24), and CR file/s which has secrets and iDRAC details (redfish address). When we apply these CRs, introspection starts and nodes become "Ready" once successful introspection.
Due to a iDRAC firmware bug when using redfish, user must set "force_persistent_boot_device=Never" (openstack baremetal node set --driver-info force_persistent_boot_device=Never ) before starting node deployment. We had applied this workaround and was able to deploy node successfully (Node in Active state in Ironic/Provisioned state in Metal3)
As per @maelk confirmation above for correct approach 2, we had kept 2-3 nodes in Ready state, applied the workaround as stated in point 2. Started the cluster creation using cluster scripts, one of the nodes was picked up for provisioning but deployment fails in this case with the error for which the workaround was applied initially. So, we have doubt here : Individual node provisioning/deployment works (using provision_host.sh) but while using correct approach 2 during cluster setup it's failing with same error.

Please let me know if you need any additional info.

sumitjadhav1 commented 4 years ago

1. you should not mix provision_host.sh script and CAPM3 based scripts. both are doing more or less the same thing under the hood (provisioning a node). If you want to use CAPI and CAPM3, use the scripts under ./scripts/v1alphaX only once your environment is ready (BMHs ready)

2. This is the right approach. Did you try to apply your workaround again ? maybe that is something you need to do every time ?

Thanks for clarification. Yes we're using correct approach 2 now (BMH Ready and then use cluster scripts).

Unfortunately we can't apply workaround runtime, because node in ironic is locked for any updates when provisioning/deployment has already started.

[2] There are two notions mixed here. The baremetal network is the network for your target cluster, i.e. what your nodes will be using for Kubernetes setup. The fact that vbmc instances were on this network is only a design decision. The constraint about vbmc nodes (and BMCs in general) is that they must be reachable from Ironic, but they do not have to be on the baremetal network. And you don't need to configure that BMC network anywhere. It will work as long as your traffic from Ironic to BMC is routed properly.

Now the error you are seeing is unrelated to the BMCs and Ironic. "failed to create remote cluster client: failed to create client for workload cluster metal3/test1: Get https://192.168.111.249:6443/api?timeout=30s: dial tcp 192.168.111.249:6443: connect: no route to host" means that CAPI or CAPM3 is trying to talk to your target cluster after provisioning. 192.168.111.249:6443 is the load-balancer port to reach the API server. If your provisioning was not successful, or your deployment failed, then that load balancer won't be reachable and you will get this error. But it is unrelated to BMCs.

Thanks for this clarification as well. Okay so these errors are expected till the provisioning/deployment fails which is the blocker for us now. Appreciate help on this front.

dhellmann commented 4 years ago

Due to a iDRAC firmware bug when using redfish, user must set "force_persistent_boot_device=Never" (openstack baremetal node set --driver-info force_persistent_boot_device=Never ) before starting node deployment. We had applied this workaround and was able to deploy node successfully (Node in Active state in Ironic/Provisioned state in Metal3)

Does that flag change ironic's behavior or does ironic pass it to the BMC to change the hosts's behavior?

juliakreger commented 4 years ago

@dhellmann Changes ironic's behavior in not asserting persistent boot flags. The bug @sumitjadhav1 is speaking of where the BMC receives the flag it returns an error unexpectedly. That being said, I've heard from my dell contacts that the bug is expected to be fixed in the very next iDRAC firmware release since it previously worked just fine.

dhellmann commented 4 years ago

@dhellmann Changes ironic's behavior in not asserting persistent boot flags. The bug @sumitjadhav1 is speaking of where the BMC receives the flag it returns an error unexpectedly. That being said, I've heard from my dell contacts that the bug is expected to be fixed in the very next iDRAC firmware release since it previously worked just fine.

Thanks for the details, @juliakreger .

I'm not sure how much work we should do in metal3 to work around bugs in firmware. I wouldn't, for example, want to expose an API to let the user control the force_persistent_boot_device flag explicitly. I could see us always setting the flag to false, but I don't know what side-effects we might end up with from that so I wouldn't want to take that step lightly.

sumitjadhav1 commented 4 years ago

Update : 10/04/2020

For the time being, we downgraded to older working iRDAC FW and able to do node deployment using Cluster scripts (till provision_controlplane.sh). Now we're stuck at hardware-level issues and debugging/resolving the same.
We also discussed couple of issues with @maelk over slack-channel and he has suggested some changes in setup and we're working on that front as well. Will update here as soon as we proceed to next step.

sumitjadhav1 commented 4 years ago

We have been testing cluster/node provisioning using CentOS8 and came to know (discussion on Slack channel) that they (CentOS-7/8) aren't UEFI capable. The vBMC setup uses CentOS/Ubuntu in BIOS mode.
Currently we're testing cluster/node provisioning using Ubuntu cloud image with vBMC nodes in metal3-dev-env setup, however not yet successful. Once it's successful, we shall test the same on real node setup.

sumitjadhav1 commented 4 years ago

This activity was on-hold for couple of weeks (due to hardware-classification-controller internal testing), will try to update once we have results. Currently reading the design in https://github.com/metal3-io/metal3-docs/pull/78 for more details (as suggested in last community meeting).

sumitjadhav1 commented 4 years ago

Now able to provision Dell PowerEdge Server nodes using cluster scripts provided by Metal3. Also we used IPAM feature. Testing done for nodes in both UEFI/BIOS boot modes. This verification was done after following fixes :

However, currently we're observing issues in Node de-provisioning step using deprovision_worker script (node going in clean_failed state). We shall perform couple of rounds for provisioning & de-provisioning, if found consistently reproducible, will create a new issue.

Hence closing this issue now.

metal3-io / baremetal-operator

Need Clarification for Cluster Creation and Node Provisioning with Real Hardware #477