metal3-io / cluster-api-provider-metal3

Metal³ integration with https://github.com/kubernetes-sigs/cluster-api
Apache License 2.0
208 stars 90 forks source link

CAPM3 fail waiting the static interface name specified in the template #1998

Open mboukhalfa opened 5 days ago

mboukhalfa commented 5 days ago

What steps did you take and what happened: While testing applying cluster template with fakeIPA environment the provisioning of bmh did not start though the machine was in the provisoning state and I see the error in the CAPM3

E0925 06:16:26.153022       1 controller.go:324] "msg"="Reconciler error" "error"="Failed to create secrets: NIC name not found enp1s0" "Metal3Data"={"name":"test1-workers-template-0","namespace":"metal3"} "cont
roller"="metal3data" "controllerGroup"="infrastructure.cluster.x-k8s.io" "controllerKind"="Metal3Data" "name"="test1-workers-template-0" "namespace"="metal3" "reconcileID"="75cdec08-a060-4a6f-b69e-21b0d80dc072" I0925 06:16:26.154381       1 metal3data_manager.go:163] "msg"="Metadata is part of Metal3DataTemplate" "cluster"="test1" "logger"="controllers.Metal3Data.Metal3Data-controller" "metal3-data"={"Namespace":"metal
3","Name":"test1-workers-template-0"} 

The interface name of the fake nodes was eth1 and that what I see on the inspected BMH:

  hardware:
    cpu:
      arch: x86_64
      clockMegahertz: 2100
      count: 2
      flags:
      - fpu
      - fxsr
      - mmx
      - sse
      - sse2
    firmware:
      bios: {}
    nics:
    - ip: 172.22.0.100
      mac: 00:5c:52:31:3a:9c
      model: 0x1af4 0x0001
      name: eth1
    systemVendor: {}

using the same names used in the cluster template in the fake VMs : enp1s0, enp2s0 fixed the issue but this might not be the wanted behavior since the nic names are specified by the OS so the cluster template names can be different and should not break the provisioning.

What did you expect to happen: CAPM3 should still be able to continue if the nic names are different from the template the only required ID should be the MAC address.

Anything else you would like to add: fakeIPA PR discussion : https://github.com/metal3-io/utility-images/pull/20

Environment:

/kind bug

Rozzii commented 5 days ago

We have noticed it during discussion on https://github.com/metal3-io/utility-images/pull/20 . As far as I know NIC names e.g. "eth1" are only relevant within the network data file but does not identify the interfaces, for identification the MAC address is used.

From my POV I don't really understand why CAPM3 would need to compare the interface names of the BMH inspection data with the network data file as the interface names are coming from IPA and they might be different when the machine reboots and executes the network configuration via cloud-init.

I suspect that this is a bug and we should remove this "NIC name validation".

Rozzii commented 4 days ago

/triage accepted