syself / cluster-api-provider-hetzner

Cluster API Provider Hetzner :rocket: The best way to manage Kubernetes clusters on Hetzner, fully declarative, Kubernetes-native and with self-healing capabilities
https://caph.syself.com
Apache License 2.0
678 stars 60 forks source link

Reconcile error for HetznerBareMetalHost and more than one network interface #939

Closed alexkasatikov closed 8 months ago

alexkasatikov commented 1 year ago

/kind bug

What steps did you take and what happened: I'm trying to set up k8s cluster with only one node using hetzner-baremetal-control-planes flavor. After generating cluster and adding HetznerBareMetalHost I don't see any detail about host hardware when doing kubectl describe hetznerbaremetalhost. Here is the log from caph-controller-manager:

Log { "level": "ERROR", "time": "2023-09-21T11:18:53.496Z", "file": "controller/controller.go:324", "message": "Reconciler error", "controller": "hetznerbaremetalhost", "controllerGroup": "infrastructure.cluster.x-k8s.io", "controllerKind": "HetznerBareMetalHost", "HetznerBareMetalHost": { "name": "de1459", "namespace": "de-dev" }, "namespace": "de-dev", "name": "de1459", "reconcileID": "9283fd2c-9da9-4274-aaae-ffbea85dbf64", "error": "failed to reconcile HetznerBareMetalHost de-dev/de1459: action \"registering\" failed: failed to get hardware details: failed to obtain hardware details Nics: failed to unmarshal {\"name\":\"eth0\",\"model\":\"Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)}. Original ssh output name=\"eth0\" model=\"Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)\nIntel Corporation I350 Gigabit Network Connection (rev 01)\" mac=\"f0:2f:74:94:a2:41\" ip=\"162.55.151.48/26\" speedMbps=\"1000\"\nname=\"eth0\" model=\"Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)\nIntel Corporation I350 Gigabit Network Connection (rev 01)\" mac=\"f0:2f:74:94:a2:41\" ip=\"2a01:4f8:262:265f::2/64\" speedMbps=\"1000\": unexpected end of JSON input", "stacktrace": "sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/src/cluster-api-provider-hetzner/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:324\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/src/cluster-api-provider-hetzner/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:265\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/src/cluster-api-provider-hetzner/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:226" }

What did you expect to happen: Reconcilation completed successfully

Anything else you would like to add: I assume that's due to this line: https://github.com/syself/cluster-api-provider-hetzner/blob/v1.0.0-beta.22/pkg/services/baremetal/client/ssh/ssh_client.go#L144 When executed on host, it returns 2 lines:

root@rescue ~ # lspci | grep net | awk '{$1=$2=$3=""; print $0}' | sed "s/^[ \t]*//"
Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
Intel Corporation I350 Gigabit Network Connection (rev 01)

and the script output is like that:

root@rescue ~ # bash nic-info.sh
name="eth0" model="Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
Intel Corporation I350 Gigabit Network Connection (rev 01)" mac="f0:2f:74:94:a2:41" ip="162.55.151.48/26" speedMbps="1000"
name="eth0" model="Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
Intel Corporation I350 Gigabit Network Connection (rev 01)" mac="f0:2f:74:94:a2:41" ip="2a01:4f8:262:265f::2/64" speedMbps="1000"

Environment:

alexkasatikov commented 1 year ago

As an idea, something like that could be used instead: lspci -s $(ethtool -i $iname | grep bus-info | awk '{print $2}') | cut -d ':' -f 3

batistein commented 1 year ago

@guettli please have a look here

benedikt-bartscher commented 1 year ago

I ran into a similar error on a RX220 Host with the following versions:

Log:

{"level":"ERROR","time":"2023-10-06T17:33:42.789Z","file":"controller/controller.go:324","message":"Reconciler error","controller":"hetznerbaremetalhost","controllerGroup":"infrastructure.cluster.x-k8s.io","controllerKind":"HetznerBareMetalHost","HetznerBareMetalHost":{"name":"bm-arm-01","namespace":"default"},"namespace":"default","name":"bm-arm-01","reconcileID":"3f8d7595-bc7f-4174-97d0-b7b49efbc96d","error":"failed to reconcile HetznerBareMetalHost default/bm-arm-01: action \"registering\" failed: failed to get hardware details: failed to obtain hardware details Nics: failed to unmarshal {\"name\":\"eth0\",\"model\":\"Intel Corporation I350 Gigabit Network Connection (rev 01)}. Original ssh output name=\"eth0\" model=\"Intel Corporation I350 Gigabit Network Connection (rev 01)\nIntel Corporation I350 Gigabit Network Connection (rev 01)\" mac=\"88:88:88:88:88:88\" ip=\"111.111.111.11/26\" speedMbps=\"1000\"\nname=\"eth0\" model=\"Intel Corporation I350 Gigabit Network Connection (rev 01)\nIntel Corporation I350 Gigabit Network Connection (rev 01)\" mac=\"88:88:88:88:88:88\" ip=\"2a01:2a01:2a01:2a01::2/64\" speedMbps=\"1000\": unexpected end of JSON input","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/src/cluster-api-provider-hetzner/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:324\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/src/cluster-api-provider-hetzner/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:265\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/src/cluster-api-provider-hetzner/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:226"}
guettli commented 1 year ago

Above log line, pretty printed:


❯ xclip -o | yq -P
level: ERROR
time: "2023-10-06T17:33:42.789Z"
file: controller/controller.go:324
message: Reconciler error
controller: hetznerbaremetalhost
controllerGroup: infrastructure.cluster.x-k8s.io
controllerKind: HetznerBareMetalHost
HetznerBareMetalHost:
  name: bm-arm-01
  namespace: default
namespace: default
name: bm-arm-01
reconcileID: 3f8d7595-bc7f-4174-97d0-b7b49efbc96d
error: |-
  failed to reconcile HetznerBareMetalHost default/bm-arm-01: action "registering" failed:
   failed to get hardware details: failed to obtain hardware details Nics: 
   failed to unmarshal {"name":"eth0","model":"Intel Corporation I350 Gigabit Network Connection (rev 01)}. 
   Original ssh output name="eth0" model="Intel Corporation I350 Gigabit Network Connection (rev 01)
  Intel Corporation I350 Gigabit Network Connection (rev 01)" mac="88:88:88:88:88:88" ip="111.111.111.11/26" speedMbps="1000"
  name="eth0" model="Intel Corporation I350 Gigabit Network Connection (rev 01)
  Intel Corporation I350 Gigabit Network Connection (rev 01)" mac="88:88:88:88:88:88" ip="2a01:2a01:2a01:2a01::2/64"
   speedMbps="1000": unexpected end of JSON input
stacktrace: |-
  sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
        /src/cluster-api-provider-hetzner/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:324
  sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
        /src/cluster-api-provider-hetzner/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:265
  sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
        /src/cluster-api-provider-hetzner/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:226
benedikt-bartscher commented 11 months ago

Is there any work in progress regarding this issue? If not, would you accept a pull request?

guettli commented 11 months ago

@benedikt-bartscher yes, a PR is welcome. BTW, do you have an idea how to reproduce this error? Is there a way to create a second (fake) network interface somehow? Then I could validate you PR manually.

benedikt-bartscher commented 11 months ago

Hey @guettli thanks for your response. You could alias lspci to echo some test data. I am not aware of any other trick which results in a "fake" NIC appearing in lspci/ethtool. Aren't your e2e tests sponsored by Hetzner? Maybe they can provide you a server with 2 NICs. If not, I could provide you with one of our machines for some coding/testing for free.

janiskemper commented 11 months ago

They will, no problem. If you can open a PR, we will be able to test it as well!

Lenikas commented 8 months ago

Hello guys.

Unfortunately, we encountered the same issue while deploying a Kubernetes cluster on baremetal servers from Hetzner with the cluster-api-provider-hetzner.

We have a server of type AX41-NVMe with a single network interface, and the technical details of the server are successfully obtained, and the subsequent bootstrap completes successfully.

However, we also have different servers of types EX130-R/EX130-S, which have two network interfaces:

root@rescue ~ # lspci | grep net | awk '{$1=$2=$3=""; print $0}' | sed "s/^[ \t]*//"
Intel Corporation Ethernet Controller X550 (rev 01)
Intel Corporation Ethernet Controller X550 (rev 01)

Similar to the example @alexkasatikov we have logs from caph-controller-manager:

{
  "level": "ERROR",
  "time": "2024-03-10T19:02:52.056Z",
  "file": "controller/controller.go:329",
  "message": "Reconciler error",
  "controller": "hetznerbaremetalhost",
  "controllerGroup": "infrastructure.cluster.x-k8s.io",
  "controllerKind": "HetznerBareMetalHost",
  "HetznerBareMetalHost": {
    "name": "infra-dev-02-worker-bm-2332683",
    "namespace": "default"
  },
  "namespace": "default",
  "name": "infra-dev-02-worker-bm-2332683",
  "reconcileID": "5b2d4c5a-f010-42df-8532-8c1388861c86",
  "error": "failed to reconcile HetznerBareMetalHost default/infra-dev-02-worker-bm-2332683: action
   \"registering\" failed: failed to get hardware details: failed to obtain hardware details Nics: failed to 
   unmarshal {\"name\":\"eth0\",\"model\":\"Intel Corporation Ethernet Controller X550 (rev 01)}. Original ssh 
   output name=\"eth0\" model=\"Intel Corporation Ethernet Controller X550 (rev 01)\\nIntel Corporation 
   Ethernet Controller X550 (rev 01)\" mac=\"a8:a1:59:fb:c4:db\" ip=\"37.27.63.175/26\" 
   speedMbps=\"1000\"\\nname=\"eth0\" model=\"Intel Corporation Ethernet Controller X550 (rev 01)\\nIntel 
   Corporation Ethernet Controller X550 (rev 01)\" mac=\"a8:a1:59:fb:c4:db\" ip=\"2a01:4f9:3081:310e::2/64\" 
   speedMbps=\"1000\": unexpected end of JSON input",

  "stacktrace": "sigs.k8s.io/controller-runtime/pkg/internal/controller.
  (*Controller).reconcileHandler\\n\\tsigs.k8s.io/controller-
  runtime@v0.16.3/pkg/internal/controller/controller.go:329\\nsigs.k8s.io/controller-
  runtime/pkg/internal/controller.(*Controller).processNextWorkItem\\n\\tsigs.k8s.io/controller-
  runtime@v0.16.3/pkg/internal/controller/controller.go:266\\nsigs.k8s.io/controller-
  runtime/pkg/internal/controller.(*Controller).Start.func2.2\\n\\tsigs.k8s.io/controller-
  runtime@v0.16.3/pkg/internal/controller/controller.go:227"
}

This turned out to be a significant issue for us, as our production cluster building process encountered this problem. We would greatly appreciate it if you could find a way to fix this problem.

Environment:

batistein commented 8 months ago

@Lenikas is it possible to schedule a call for further debugging?

@guettli please have a look into this in the upcoming week.

guettli commented 8 months ago

@Lenikas can you please post the output of these commands:

ip a
ethtool "*"
lspci

thank you!

Lenikas commented 8 months ago

@Lenikas can you please post the output of these commands:

ip a
ethtool "*"
lspci

thank you!

Hello @guettli, thank you for replying!

This is output from server EX130-R type:

ip a: > ``` > 1: lo: mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 > link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 > inet 127.0.0.1/8 scope host lo > valid_lft forever preferred_lft forever > inet6 ::1/128 scope host > valid_lft forever preferred_lft forever > 2: eth0: mtu 1500 qdisc mq state UP group default qlen 1000 > link/ether 9c:6b:00:45:9c:5e brd ff:ff:ff:ff:ff:ff > altname eno1 > altname enp5s0f0 > inet 37.27.107.180/26 scope global eth0 > valid_lft forever preferred_lft forever > inet6 2a01:4f9:3070:1e05::2/64 scope global > valid_lft forever preferred_lft forever > inet6 fe80::9e6b:ff:fe45:9c5e/64 scope link > valid_lft forever preferred_lft forever > 3: eth1: mtu 1500 qdisc noop state DOWN group default qlen 1000 > link/ether 9c:6b:00:45:9c:5f brd ff:ff:ff:ff:ff:ff > altname eno2 > altname enp5s0f1 > 4: usb0: mtu 1500 qdisc noop state DOWN group default qlen 1000 > link/ether ce:b3:36:d9:a6:a2 brd ff:ff:ff:ff:ff:ff > ```
ethtool: "*" > ``` > Settings for eth0: > Supported ports: [ TP ] > Supported link modes: 100baseT/Full > 1000baseT/Full > 10000baseT/Full > 2500baseT/Full > 5000baseT/Full > Supported pause frame use: Symmetric > Supports auto-negotiation: Yes > Supported FEC modes: Not reported > Advertised link modes: 100baseT/Full > 1000baseT/Full > 10000baseT/Full > Advertised pause frame use: Symmetric > Advertised auto-negotiation: Yes > Advertised FEC modes: Not reported > Speed: 1000Mb/s > Duplex: Full > Auto-negotiation: on > Settings for eth1: > Supported ports: [ TP ] > Supported link modes: 100baseT/Full > 1000baseT/Full > 10000baseT/Full > 2500baseT/Full > 5000baseT/Full > Supported pause frame use: Symmetric > Supports auto-negotiation: Yes > Supported FEC modes: Not reported > Advertised link modes: 100baseT/Full > 1000baseT/Full > 10000baseT/Full > Advertised pause frame use: Symmetric > Advertised auto-negotiation: Yes > Advertised FEC modes: Not reported > Speed: Unknown! > Duplex: Unknown! (255) > Auto-negotiation: on > Settings for usb0: > Supported ports: [ ] > Supported link modes: Not reported > Supported pause frame use: No > Supports auto-negotiation: No > Supported FEC modes: Not reported > Advertised link modes: Not reported > Advertised pause frame use: No > Advertised auto-negotiation: No > Advertised FEC modes: Not reported > Speed: Unknown! > Duplex: Half > Auto-negotiation: off > Settings for eth0: > Port: Twisted Pair > PHYAD: 0 > Transceiver: internal > MDI-X: Unknown > Settings for eth1: > Port: Twisted Pair > PHYAD: 0 > Transceiver: internal > MDI-X: Unknown > Settings for usb0: > Port: Twisted Pair > PHYAD: 0 > Transceiver: internal > MDI-X: Unknown > Settings for eth0: > Supports Wake-on: umbg > Wake-on: g > Settings for eth1: > Supports Wake-on: umbg > Wake-on: g > Settings for eth0: > Current message level: 0x00000007 (7) > drv probe link > Settings for eth1: > Current message level: 0x00000007 (7) > drv probe link > Settings for usb0: > Current message level: 0x00000007 (7) > drv probe link > Settings for lo: > Link detected: yes > Settings for eth0: > Link detected: yes > Settings for eth1: > Link detected: no > Settings for usb0: > Link detected: no > ```
lspci: > ``` > 00:00.0 System peripheral: Intel Corporation Ice Lake Memory Map/VT-d (rev 20) > 00:00.1 System peripheral: Intel Corporation Ice Lake Mesh 2 PCIe (rev 20) > 00:00.2 System peripheral: Intel Corporation Ice Lake RAS (rev 20) > 00:00.4 Generic system peripheral [0807]: Intel Corporation Device 0b23 > 00:08.0 PCI bridge: Intel Corporation Device 1bb8 (rev 11) > 00:0a.0 PCI bridge: Intel Corporation Device 1bba (rev 11) > 00:0f.0 PCI bridge: Intel Corporation Device 1bbf (rev 11) > 00:14.0 USB controller: Intel Corporation Device 1bcd (rev 11) > 00:14.2 RAM memory: Intel Corporation Device 1bce (rev 11) > 00:14.4 Host bridge: Intel Corporation Device 1bfe (rev 11) > 00:15.0 System peripheral: Intel Corporation Device 1bff (rev 11) > 00:16.0 Communication controller: Intel Corporation Device 1be0 (rev 11) > 00:16.1 Communication controller: Intel Corporation Device 1be1 (rev 11) > 00:16.4 Communication controller: Intel Corporation Device 1be4 (rev 11) > 00:17.0 SATA controller: Intel Corporation Device 1ba2 (rev 11) > 00:1a.0 PCI bridge: Intel Corporation Device 1bb4 (rev 11) > 00:1f.0 ISA bridge: Intel Corporation Device 1b81 (rev 11) > 00:1f.4 SMBus: Intel Corporation Device 1bc9 (rev 11) > 00:1f.5 Serial bus controller: Intel Corporation Device 1bca (rev 11) > 03:00.0 PCI bridge: ASRock Incorporation Device 1150 (rev 06) > 04:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 52) > 05:00.0 Ethernet controller: Intel Corporation Ethernet Controller X550 (rev 01) > 05:00.1 Ethernet controller: Intel Corporation Ethernet Controller X550 (rev 01) > 16:00.0 System peripheral: Intel Corporation Ice Lake Memory Map/VT-d (rev 20) > 16:00.1 System peripheral: Intel Corporation Ice Lake Mesh 2 PCIe (rev 20) > 16:00.2 System peripheral: Intel Corporation Ice Lake RAS (rev 20) > 16:00.4 Generic system peripheral [0807]: Intel Corporation Device 0b23 > 42:00.0 System peripheral: Intel Corporation Ice Lake Memory Map/VT-d (rev 20) > 42:00.1 System peripheral: Intel Corporation Ice Lake Mesh 2 PCIe (rev 20) > 42:00.2 System peripheral: Intel Corporation Ice Lake RAS (rev 20) > 42:00.4 Generic system peripheral [0807]: Intel Corporation Device 0b23 > 6e:00.0 System peripheral: Intel Corporation Ice Lake Memory Map/VT-d (rev 20) > 6e:00.1 System peripheral: Intel Corporation Ice Lake Mesh 2 PCIe (rev 20) > 6e:00.2 System peripheral: Intel Corporation Ice Lake RAS (rev 20) > 6e:00.4 Generic system peripheral [0807]: Intel Corporation Device 0b23 > 9a:00.0 System peripheral: Intel Corporation Ice Lake Memory Map/VT-d (rev 20) > 9a:00.1 System peripheral: Intel Corporation Ice Lake Mesh 2 PCIe (rev 20) > 9a:00.2 System peripheral: Intel Corporation Ice Lake RAS (rev 20) > 9a:00.4 Generic system peripheral [0807]: Intel Corporation Device 0b23 > c6:00.0 System peripheral: Intel Corporation Ice Lake Memory Map/VT-d (rev 20) > c6:00.1 System peripheral: Intel Corporation Ice Lake Mesh 2 PCIe (rev 20) > c6:00.2 System peripheral: Intel Corporation Ice Lake RAS (rev 20) > c6:00.4 Generic system peripheral [0807]: Intel Corporation Device 0b23 > c6:01.0 PCI bridge: Intel Corporation Device 352a (rev 04) > c6:03.0 PCI bridge: Intel Corporation Device 352b (rev 04) > c6:05.0 PCI bridge: Intel Corporation Device 352c (rev 04) > c6:07.0 PCI bridge: Intel Corporation Device 352d (rev 04) > c7:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller PM9A1/PM9A3/980PRO > ca:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller PM9A1/PM9A3/980PRO > f2:00.0 System peripheral: Intel Corporation Ice Lake Memory Map/VT-d (rev 20) > f2:00.1 System peripheral: Intel Corporation Ice Lake Mesh 2 PCIe (rev 20) > f2:00.2 System peripheral: Intel Corporation Ice Lake RAS (rev 20) > f2:00.4 Generic system peripheral [0807]: Intel Corporation Device 0b23 > f2:01.0 System peripheral: Intel Corporation Device 0b25 > f2:03.0 System peripheral: Intel Corporation Ice Lake MSM > f2:03.1 System peripheral: Intel Corporation Ice Lake PMON MSM > fe:00.0 System peripheral: Intel Corporation Device 3250 > fe:00.1 System peripheral: Intel Corporation Device 3251 > fe:00.2 System peripheral: Intel Corporation Device 3252 > fe:00.3 Host bridge: Intel Corporation Ice Lake IEH > fe:00.5 System peripheral: Intel Corporation Device 3255 > fe:05.0 System peripheral: Intel Corporation Device 3245 > fe:05.1 System peripheral: Intel Corporation Device 3246 > fe:05.2 System peripheral: Intel Corporation Device 3247 > fe:06.0 System peripheral: Intel Corporation Device 3245 > fe:06.1 System peripheral: Intel Corporation Device 3246 > fe:06.2 System peripheral: Intel Corporation Device 3247 > fe:07.0 System peripheral: Intel Corporation Device 3245 > fe:07.1 System peripheral: Intel Corporation Device 3246 > fe:07.2 System peripheral: Intel Corporation Device 3247 > fe:0c.0 Performance counters: Intel Corporation Device 324a > fe:0d.0 Performance counters: Intel Corporation Device 324a > fe:0e.0 Performance counters: Intel Corporation Device 324a > fe:0f.0 Performance counters: Intel Corporation Device 324a > fe:1a.0 Performance counters: Intel Corporation Device 2880 > fe:1b.0 Performance counters: Intel Corporation Device 2880 > fe:1c.0 Performance counters: Intel Corporation Device 2880 > fe:1d.0 Performance counters: Intel Corporation Device 2880 > ff:00.0 System peripheral: Intel Corporation Device 324c > ff:00.1 System peripheral: Intel Corporation Device 324c > ff:00.2 System peripheral: Intel Corporation Device 324c > ff:00.3 System peripheral: Intel Corporation Device 324c > ff:00.4 System peripheral: Intel Corporation Device 324c > ff:00.5 System peripheral: Intel Corporation Device 324c > ff:00.6 System peripheral: Intel Corporation Device 324c > ff:00.7 System peripheral: Intel Corporation Device 324c > ff:01.0 System peripheral: Intel Corporation Device 324c > ff:01.1 System peripheral: Intel Corporation Device 324c > ff:01.2 System peripheral: Intel Corporation Device 324c > ff:01.3 System peripheral: Intel Corporation Device 324c > ff:01.4 System peripheral: Intel Corporation Device 324c > ff:01.5 System peripheral: Intel Corporation Device 324c > ff:01.6 System peripheral: Intel Corporation Device 324c > ff:01.7 System peripheral: Intel Corporation Device 324c > ff:02.0 System peripheral: Intel Corporation Device 324c > ff:02.1 System peripheral: Intel Corporation Device 324c > ff:02.2 System peripheral: Intel Corporation Device 324c > ff:02.3 System peripheral: Intel Corporation Device 324c > ff:02.4 System peripheral: Intel Corporation Device 324c > ff:02.5 System peripheral: Intel Corporation Device 324c > ff:02.6 System peripheral: Intel Corporation Device 324c > ff:02.7 System peripheral: Intel Corporation Device 324c > ff:0a.0 System peripheral: Intel Corporation Device 324d > ff:0a.1 System peripheral: Intel Corporation Device 324d > ff:0a.2 System peripheral: Intel Corporation Device 324d > ff:0a.3 System peripheral: Intel Corporation Device 324d > ff:0a.4 System peripheral: Intel Corporation Device 324d > ff:0a.5 System peripheral: Intel Corporation Device 324d > ff:0a.6 System peripheral: Intel Corporation Device 324d > ff:0a.7 System peripheral: Intel Corporation Device 324d > ff:0b.0 System peripheral: Intel Corporation Device 324d > ff:0b.1 System peripheral: Intel Corporation Device 324d > ff:0b.2 System peripheral: Intel Corporation Device 324d > ff:0b.3 System peripheral: Intel Corporation Device 324d > ff:0b.4 System peripheral: Intel Corporation Device 324d > ff:0b.5 System peripheral: Intel Corporation Device 324d > ff:0b.6 System peripheral: Intel Corporation Device 324d > ff:0b.7 System peripheral: Intel Corporation Device 324d > ff:0c.0 System peripheral: Intel Corporation Device 324d > ff:0c.1 System peripheral: Intel Corporation Device 324d > ff:0c.2 System peripheral: Intel Corporation Device 324d > ff:0c.3 System peripheral: Intel Corporation Device 324d > ff:0c.4 System peripheral: Intel Corporation Device 324d > ff:0c.5 System peripheral: Intel Corporation Device 324d > ff:0c.6 System peripheral: Intel Corporation Device 324d > ff:0c.7 System peripheral: Intel Corporation Device 324d > ff:1d.0 System peripheral: Intel Corporation Device 344f > ff:1d.1 System peripheral: Intel Corporation Device 3457 > ff:1e.0 System peripheral: Intel Corporation Device 3258 (rev 08) > ff:1e.1 System peripheral: Intel Corporation Device 3259 (rev 08) > ff:1e.2 System peripheral: Intel Corporation Device 325a (rev 08) > ff:1e.3 System peripheral: Intel Corporation Device 325b (rev 08) > ff:1e.4 System peripheral: Intel Corporation Device 325c (rev 08) > ff:1e.5 System peripheral: Intel Corporation Device 325d (rev 08) > ff:1e.6 System peripheral: Intel Corporation Device 325e (rev 08) > ff:1e.7 System peripheral: Intel Corporation Device 325f (rev 08) > ```

If it's important, we use custom server versions with various options. If needed, I can probably provide configuration options.

Lenikas commented 8 months ago

@Lenikas is it possible to schedule a call for further debugging?

@guettli please have a look into this in the upcoming week.

Hello @batistein, @guettli!

If relevant, we can schedule a meeting. Alternatively, we can suggest transitioning our communication to a different platform if it's more convenient for you. Additionally, we can grant you SSH access to the server for debugging purposes.

How long do you think it might take to resolve the issue? It's important for our team to understand this to plan our next steps. Unfortunately, our team lacks sufficient expertise in Go to quickly resolve this issue.

If you need any further information, we're ready to provide it.

Thank you!

batistein commented 8 months ago

@Lenikas please sent me an email at: info@syself.com

guettli commented 8 months ago

@Lenikas we created a draft which should make the error go away.

Do you need the NIC data which gets gathered by the script? Because at the moment the script nic-info.sh does not work reliably. But I guess you don't need these values, and you just want the provisioning to succeed.

Lenikas commented 8 months ago

@guettli Yes, at the moment, we simply need a fix to ensure that provisioning completes successfully.

However, we are unsure where this information may be needed in the future. Perhaps you have some ideas or is it related to some functionality of the cluster-api-provider-hetzner?

Thank you for the responsive communication!

guettli commented 8 months ago

@Lenikas the PR is merged, you can test the new caph image by updating the caph deployment in your management cluster.

Image: ghcr.io/syself/caph-staging:sha-c6fd5bb

See: https://github.com/syself/cluster-api-provider-hetzner/pkgs/container/caph-staging/190282019?tag=sha-c6fd5bb

Please tell us if this works for you. Thank you.

batistein commented 8 months ago

@Lenikas we just released a new version of caph. Should be now usable with clusterctl as well.

Lenikas commented 7 months ago

@guettli Hello I apologize for the delayed response.

Yes, I have checked the built image, it works. The provisioning completes successfully, and the nodes are added to the cluster.

Thank you so much!