k8s-proxmox / cluster-api-provider-proxmox

Cluster API provider implementation for Proxmox VE
Apache License 2.0
94 stars 11 forks source link

Using v0.3.2 causes invalid memory address or nil pointer dereference #127

Closed dohq closed 7 months ago

dohq commented 8 months ago

/kind bug

What steps did you take and what happened: I used version v0.3.2 and followed the instructions in the Readme to execute the commands.

What did you expect to happen: That a VM is created in proxmox.

Anything else you would like to add:

  1. I utilized an already operational k3s cluster as a Bootstrap cluster.
  2. cluster-api-provider-proxmox-controller-manager logs↓.
    
    (*'-') < kubectl logs -n cluster-api-provider-proxmox-system cluster-api-provider-proxmox-controller-manager-9bc449bf6-7tzpf
    2023-11-09T10:12:17Z    INFO    controller-runtime.metrics      Metrics server is starting to listen    {"addr": "127.0.0.1:8080"}
    2023-11-09T10:12:17Z    INFO    setup   starting manager
    2023-11-09T10:12:17Z    INFO    Starting server {"kind": "health probe", "addr": "[::]:8081"}
    2023-11-09T10:12:17Z    INFO    starting server {"path": "/metrics", "kind": "metrics", "addr": "127.0.0.1:8080"}
    I1109 10:12:17.205516       1 leaderelection.go:245] attempting to acquire leader lease cluster-api-provider-proxmox-system/36404136.cluster.x-k8s.io...
    I1109 10:12:35.575426       1 leaderelection.go:255] successfully acquired lease cluster-api-provider-proxmox-system/36404136.cluster.x-k8s.io
    2023-11-09T10:12:35Z    INFO    Starting EventSource    {"controller": "proxmoxmachine", "controllerGroup": "infrastructure.cluster.x-k8s.io", "controllerKind": "ProxmoxMachine", "source": "kind source: *v1beta1.ProxmoxMachine"}
    2023-11-09T10:12:35Z    INFO    Starting Controller     {"controller": "proxmoxmachine", "controllerGroup": "infrastructure.cluster.x-k8s.io", "controllerKind": "ProxmoxMachine"}
    2023-11-09T10:12:35Z    DEBUG   events  cluster-api-provider-proxmox-controller-manager-9bc449bf6-7tzpf_4b7ffadc-23b4-47a9-a3f8-269de7b25e3e became leader      {"type": "Normal", "object": {"kind":"Lease","namespace":"cluster-api-provider-proxmox-system","name":"36404136.cluster.x-k8s.io","uid":"0194e364-76a8-4469-aff5-7c166a497086","apiVersion":"coordination.k8s.io/v1","resourceVersion":"384646"}, "reason": "LeaderElection"}
    2023-11-09T10:12:35Z    INFO    Starting EventSource    {"controller": "proxmoxcluster", "controllerGroup": "infrastructure.cluster.x-k8s.io", "controllerKind": "ProxmoxCluster", "source": "kind source: *v1beta1.ProxmoxCluster"}
    2023-11-09T10:12:35Z    INFO    Starting Controller     {"controller": "proxmoxcluster", "controllerGroup": "infrastructure.cluster.x-k8s.io", "controllerKind": "ProxmoxCluster"}
    2023-11-09T10:12:35Z    INFO    Starting workers        {"controller": "proxmoxcluster", "controllerGroup": "infrastructure.cluster.x-k8s.io", "controllerKind": "ProxmoxCluster", "worker count": 1}
    2023-11-09T10:12:35Z    INFO    Starting workers        {"controller": "proxmoxmachine", "controllerGroup": "infrastructure.cluster.x-k8s.io", "controllerKind": "ProxmoxMachine", "worker count": 1}
    2023-11-09T10:12:35Z    INFO    Reconciling Delete ProxmoxCluster       {"controller": "proxmoxcluster", "controllerGroup": "infrastructure.cluster.x-k8s.io", "controllerKind": "ProxmoxCluster", "ProxmoxCluster": {"name":"cappx","namespace":"default"}, "namespace": "default", "name": "cappx", "reconcileID": "032c1c6a-5a3e-495d-837b-fd189f8e5b73"}
    2023-11-09T10:12:35Z    INFO    Deleteing storage       {"controller": "proxmoxcluster", "controllerGroup": "infrastructure.cluster.x-k8s.io", "controllerKind": "ProxmoxCluster", "ProxmoxCluster": {"name":"cappx","namespace":"default"}, "namespace": "default", "name": "cappx", "reconcileID": "032c1c6a-5a3e-495d-837b-fd189f8e5b73"}
    2023-11-09T10:12:35Z    INFO    Reconciling ProxmoxMachine      {"controller": "proxmoxmachine", "controllerGroup": "infrastructure.cluster.x-k8s.io", "controllerKind": "ProxmoxMachine", "ProxmoxMachine": {"name":"cappx-controlplane-nkv9h","namespace":"default"}, "namespace": "default", "name": "cappx-controlplane-nkv9h", "reconcileID": "b1d8126e-ede4-4303-8746-53f96722876f"}
    2023-11-09T10:12:35Z    INFO    Reconciling instance    {"controller": "proxmoxmachine", "controllerGroup": "infrastructure.cluster.x-k8s.io", "controllerKind": "ProxmoxMachine", "ProxmoxMachine": {"name":"cappx-controlplane-nkv9h","namespace":"default"}, "namespace": "default", "name": "cappx-controlplane-nkv9h", "reconcileID": "b1d8126e-ede4-4303-8746-53f96722876f"}
    2023-11-09T10:12:35Z    INFO    instance does not have providerID yet   {"controller": "proxmoxmachine", "controllerGroup": "infrastructure.cluster.x-k8s.io", "controllerKind": "ProxmoxMachine", "ProxmoxMachine": {"name":"cappx-controlplane-nkv9h","namespace":"default"}, "namespace": "default", "name": "cappx-controlplane-nkv9h", "reconcileID": "b1d8126e-ede4-4303-8746-53f96722876f"}
    2023-11-09T10:12:35Z    INFO    instance wasn't found. new instance will be created     {"controller": "proxmoxmachine", "controllerGroup": "infrastructure.cluster.x-k8s.io", "controllerKind": "ProxmoxMachine", "ProxmoxMachine": {"name":"cappx-controlplane-nkv9h","namespace":"default"}, "namespace": "default", "name": "cappx-controlplane-nkv9h", "reconcileID": "b1d8126e-ede4-4303-8746-53f96722876f"}
    2023-11-09T10:12:35Z    INFO    Reconciling QEMU        {"controller": "proxmoxmachine", "controllerGroup": "infrastructure.cluster.x-k8s.io", "controllerKind": "ProxmoxMachine", "ProxmoxMachine": {"name":"cappx-controlplane-nkv9h","namespace":"default"}, "namespace": "default", "name": "cappx-controlplane-nkv9h", "reconcileID": "b1d8126e-ede4-4303-8746-53f96722876f"}
    2023-11-09T10:12:35Z    INFO    Observed a panic in reconciler: runtime error: invalid memory address or nil pointer dereference        {"controller": "proxmoxmachine", "controllerGroup": "infrastructure.cluster.x-k8s.io", "controllerKind": "ProxmoxMachine", "ProxmoxMachine": {"name":"cappx-controlplane-nkv9h","namespace":"default"}, "namespace": "default", "name": "cappx-controlplane-nkv9h", "reconcileID": "b1d8126e-ede4-4303-8746-53f96722876f"}
    panic: runtime error: invalid memory address or nil pointer dereference [recovered]
        panic: runtime error: invalid memory address or nil pointer dereference
    [signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x1636832]

goroutine 111 [running]: sigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).Reconcile.func1() /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.1/pkg/internal/controller/controller.go:115 +0x1fa panic({0x17af380, 0x28e3170}) /usr/local/go/src/runtime/panic.go:884 +0x213 github.com/sp-yduck/cluster-api-provider-proxmox/cloud/services/compute/instance.(Service).createQEMU(0xc000783980, {0x1c32930, 0xc0005a8690}, {0xc00052ab30, 0x7}, 0xc0004acd88) /workspace/cloud/services/compute/instance/qemu.go:79 +0x252 github.com/sp-yduck/cluster-api-provider-proxmox/cloud/services/compute/instance.(Service).reconcileQEMU(0xc000783980, {0x1c32930, 0xc0005a8690}) /workspace/cloud/services/compute/instance/qemu.go:36 +0x150 github.com/sp-yduck/cluster-api-provider-proxmox/cloud/services/compute/instance.(Service).createInstance(0x1c34d00?, {0x1c32930, 0xc0005a8690}) /workspace/cloud/services/compute/instance/reconcile.go:140 +0x76 github.com/sp-yduck/cluster-api-provider-proxmox/cloud/services/compute/instance.(Service).createOrGetInstance(0x1c34d00?, {0x1c32930, 0xc0005a8690}) /workspace/cloud/services/compute/instance/reconcile.go:88 +0xf4 github.com/sp-yduck/cluster-api-provider-proxmox/cloud/services/compute/instance.(Service).Reconcile(0xc000783980, {0x1c32930, 0xc0005a8690}) /workspace/cloud/services/compute/instance/reconcile.go:25 +0xa5 github.com/sp-yduck/cluster-api-provider-proxmox/controllers.(ProxmoxMachineReconciler).reconcile(0x0?, {0x1c32930, 0xc0005a8690}, 0xc0005a8a20) /workspace/controllers/proxmoxmachine_controller.go:156 +0x21c github.com/sp-yduck/cluster-api-provider-proxmox/controllers.(ProxmoxMachineReconciler).Reconcile(0xc000010090, {0x1c32930, 0xc0005a8690}, {{{0xc00052aaa6?, 0xc000226ea0?}, {0xc0000c9920?, 0x40e007?}}}) /workspace/controllers/proxmoxmachine_controller.go:136 +0x8bb sigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).Reconcile(0x1c32930?, {0x1c32930?, 0xc0005a8690?}, {{{0xc00052aaa6?, 0x1729c20?}, {0xc0000c9920?, 0x1c1f838?}}}) /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.1/pkg/internal/controller/controller.go:118 +0xc8 sigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).reconcileHandler(0xc00031c140, {0x1c32888, 0xc00031a190}, {0x182f520?, 0xc0003659e0?}) /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.1/pkg/internal/controller/controller.go:314 +0x377 sigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).processNextWorkItem(0xc00031c140, {0x1c32888, 0xc00031a190}) /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.1/pkg/internal/controller/controller.go:265 +0x1d9 sigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).Start.func2.2() /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.1/pkg/internal/controller/controller.go:226 +0x85 created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2 /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.1/pkg/internal/controller/controller.go:222 +0x587



**Environment:**
- Cluster-api-provider-proxmox version: v0.3.2
- Proxmox VE version: 8.0.4
- Kubernetes version: (use `kubectl version`): v1.27.7+k3s1
- OS (e.g. from `/etc/os-release`): 

Thank you for the great work! Please don't hesitate to ask if you need any more information.
sp-yduck commented 8 months ago

Thank you for reporting ! could you share the ProxmoxMachine of cappx-controlplane-nkv9 ? especially spec field.

kubectl -n default get proxmoxmachine cappx-controlplane-nkv9 -oyaml
dohq commented 8 months ago

Thank you for reporting ! could you share the ProxmoxMachine of cappx-controlplane-nkv9 ? especially spec field.

kubectl -n default get proxmoxmachine cappx-controlplane-nkv9 -oyaml

OK I've recreated the resources several times, so please forgive the different resource names. The error remained unchanged.

(*'-') < kubectl get proxmoxmachines.infrastructure.cluster.x-k8s.io cappx-controlplane-ctxgj -o yaml| yq '.spec'
cloudInit:
  user:
    packages:
      - socat
      - conntrack
    runCmd:
      - modprobe overlay
      - modprobe br_netfilter
      - sysctl --system
      - mkdir -p /usr/local/bin
      - curl -L "https://github.com/containerd/containerd/releases/download/v1.7.2/containerd-1.7.2-linux-amd64.tar.gz" | tar Cxvz "/usr/local"
      - curl -L "https://raw.githubusercontent.com/containerd/containerd/main/containerd.service" -o /etc/systemd/system/containerd.service
      - mkdir -p /etc/containerd
      - containerd config default > /etc/containerd/config.toml
      - sed 's/SystemdCgroup = false/SystemdCgroup = true/g' /etc/containerd/config.toml -i
      - systemctl daemon-reload
      - systemctl enable --now containerd
      - mkdir -p /usr/local/sbin
      - curl -L "https://github.com/opencontainers/runc/releases/download/v1.1.7/runc.amd64" -o /usr/local/sbin/runc
      - chmod 755 /usr/local/sbin/runc
      - mkdir -p /opt/cni/bin
      - curl -L "https://github.com/containernetworking/plugins/releases/download/v1.3.0/cni-plugins-linux-amd64-v1.3.0.tgz" | tar -C "/opt/cni/bin" -xz
      - curl -L "https://github.com/kubernetes-sigs/cri-tools/releases/download/v1.27.0/crictl-v1.27.0-linux-amd64.tar.gz" | tar -C "/usr/local/bin" -xz
      - curl -L --remote-name-all https://dl.k8s.io/release/v1.27.3/bin/linux/amd64/kubeadm -o /usr/local/bin/kubeadm
      - chmod +x /usr/local/bin/kubeadm
      - curl -L --remote-name-all https://dl.k8s.io/release/v1.27.3/bin/linux/amd64/kubelet -o /usr/local/bin/kubelet
      - chmod +x /usr/local/bin/kubelet
      - curl -sSL "https://raw.githubusercontent.com/kubernetes/release/v0.15.1/cmd/kubepkg/templates/latest/deb/kubelet/lib/systemd/system/kubelet.service" | sed "s:/usr/bin:/usr/local/bin:g" | tee /etc/systemd/system/kubelet.service
      - mkdir -p /etc/systemd/system/kubelet.service.d
      - curl -sSL "https://raw.githubusercontent.com/kubernetes/release/v0.15.1/cmd/kubepkg/templates/latest/deb/kubeadm/10-kubeadm.conf" | sed "s:/usr/bin:/usr/local/bin:g" | tee /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
      - systemctl enable kubelet.service
    writeFiles:
      - content: overlay\nbr_netfilter
        owner: root:root
        path: /etc/modules-load.d/k8s.conf
        permissions: "0640"
      - content: |
          net.bridge.bridge-nf-call-iptables  = 1
          net.bridge.bridge-nf-call-ip6tables = 1
          net.ipv4.ip_forward                 = 1
        owner: root:root
        path: /etc/sysctl.d/k8s.conf
        permissions: "0640"
hardware:
  cpu: 4
  disk: 50G
  memory: 8192
image:
  checksum: c5eed826009c9f671bc5f7c9d5d63861aa2afe91aeff1c0d3a4cb5b28b2e35d6
  checksumType: sha256
  url: https://cloud-images.ubuntu.com/releases/jammy/release-20230914/ubuntu-22.04-server-cloudimg-amd64-disk-kvm.img
sp-yduck commented 8 months ago

could you try latest release and see if the issue is stil remaining ? https://github.com/sp-yduck/cluster-api-provider-proxmox/releases/tag/v0.3.3

dohq commented 8 months ago

Thankyou! I clean install Bootstrap k3s cluster and retry However, the situation remained unchanged...

I1110 22:12:43.280974       1 listener.go:44] "controller-runtime/metrics: Metrics server is starting to listen" addr="127.0.0.1:8080"
I1110 22:12:43.281310       1 scheduler.go:45] "load plugin config: {map[CPUOvercommit:{false map[]} MemoryOvercommit:{false map[]}] map[] map[]}"
I1110 22:12:43.281358       1 main.go:139] "setup: starting manager"
I1110 22:12:43.281656       1 internal.go:360] "Starting server" kind="health probe" addr="[::]:8081"
I1110 22:12:43.281732       1 leaderelection.go:245] attempting to acquire leader lease cappx-system/36404136.cluster.x-k8s.io...
I1110 22:12:43.281686       1 server.go:50] "starting server" path="/metrics" kind="metrics" addr="127.0.0.1:8080"
I1110 22:12:59.298834       1 leaderelection.go:255] successfully acquired lease cappx-system/36404136.cluster.x-k8s.io
I1110 22:12:59.298974       1 controller.go:177] "Starting EventSource" controller="proxmoxmachine" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="ProxmoxMachine" source="kind source: *v1beta1.ProxmoxMachine"
I1110 22:12:59.298983       1 controller.go:185] "Starting Controller" controller="proxmoxmachine" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="ProxmoxMachine"
I1110 22:12:59.299092       1 controller.go:177] "Starting EventSource" controller="proxmoxcluster" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="ProxmoxCluster" source="kind source: *v1beta1.ProxmoxCluster"
I1110 22:12:59.299102       1 controller.go:185] "Starting Controller" controller="proxmoxcluster" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="ProxmoxCluster"
I1110 22:12:59.400650       1 controller.go:219] "Starting workers" controller="proxmoxcluster" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="ProxmoxCluster" worker count=1
I1110 22:12:59.400718       1 controller.go:219] "Starting workers" controller="proxmoxmachine" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="ProxmoxMachine" worker count=1
I1110 22:12:59.660691       1 proxmoxcluster_controller.go:108] "Reconciling ProxmoxCluster" controller="proxmoxcluster" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="ProxmoxCluster" ProxmoxCluster="default/cappx" namespace="default" name="cappx" reconcileID=f99abf14-91cd-4ede-b024-54036288fa7b
I1110 22:12:59.661101       1 reconcile.go:20] "Reconciling storage" controller="proxmoxcluster" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="ProxmoxCluster" ProxmoxCluster="default/cappx" namespace="default" name="cappx" reconcileID=f99abf14-91cd-4ede-b024-54036288fa7b
I1110 22:12:59.673679       1 proxmoxmachine_controller.go:144] "Reconciling ProxmoxMachine" controller="proxmoxmachine" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="ProxmoxMachine" ProxmoxMachine="default/cappx-md-0-jbbzl" namespace="default" name="cappx-md-0-jbbzl" reconcileID=a077cb8c-4203-4138-96db-aaa9b5b3c3e6
I1110 22:12:59.675972       1 reconcile.go:26] "Reconciled storage" controller="proxmoxcluster" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="ProxmoxCluster" ProxmoxCluster="default/cappx" namespace="default" name="cappx" reconcileID=f99abf14-91cd-4ede-b024-54036288fa7b
I1110 22:12:59.676063       1 proxmoxcluster_controller.go:136] "Reconciled ProxmoxCluster" controller="proxmoxcluster" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="ProxmoxCluster" ProxmoxCluster="default/cappx" namespace="default" name="cappx" reconcileID=f99abf14-91cd-4ede-b024-54036288fa7b
I1110 22:12:59.680051       1 reconcile.go:24] "Reconciling instance" controller="proxmoxmachine" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="ProxmoxMachine" ProxmoxMachine="default/cappx-md-0-jbbzl" namespace="default" name="cappx-md-0-jbbzl" reconcileID=a077cb8c-4203-4138-96db-aaa9b5b3c3e6
I1110 22:12:59.680067       1 reconcile.go:105] "instance does not have providerID yet" controller="proxmoxmachine" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="ProxmoxMachine" ProxmoxMachine="default/cappx-md-0-jbbzl" namespace="default" name="cappx-md-0-jbbzl" reconcileID=a077cb8c-4203-4138-96db-aaa9b5b3c3e6
I1110 22:12:59.680076       1 reconcile.go:89] "instance wasn't found. new instance will be created" controller="proxmoxmachine" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="ProxmoxMachine" ProxmoxMachine="default/cappx-md-0-jbbzl" namespace="default" name="cappx-md-0-jbbzl" reconcileID=a077cb8c-4203-4138-96db-aaa9b5b3c3e6
I1110 22:12:59.680083       1 qemu.go:21] "Reconciling QEMU" controller="proxmoxmachine" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="ProxmoxMachine" ProxmoxMachine="default/cappx-md-0-jbbzl" namespace="default" name="cappx-md-0-jbbzl" reconcileID=a077cb8c-4203-4138-96db-aaa9b5b3c3e6
I1110 22:12:59.680091       1 qemu.go:39] "getting qemu from vmid" controller="proxmoxmachine" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="ProxmoxMachine" ProxmoxMachine="default/cappx-md-0-jbbzl" namespace="default" name="cappx-md-0-jbbzl" reconcileID=a077cb8c-4203-4138-96db-aaa9b5b3c3e6
I1110 22:12:59.680134       1 qemu.go:49] "creating qemu" controller="proxmoxmachine" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="ProxmoxMachine" ProxmoxMachine="default/cappx-md-0-jbbzl" namespace="default" name="cappx-md-0-jbbzl" reconcileID=a077cb8c-4203-4138-96db-aaa9b5b3c3e6
I1110 22:12:59.680147       1 storage.go:15] "ensuring storage is available" controller="proxmoxmachine" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="ProxmoxMachine" ProxmoxMachine="default/cappx-md-0-jbbzl" namespace="default" name="cappx-md-0-jbbzl" reconcileID=a077cb8c-4203-4138-96db-aaa9b5b3c3e6
I1110 22:12:59.680223       1 scheduler.go:173] "Start Running Scheduler" Name="qemu-scheduler" schedulerID=0xc00011d4c8
I1110 22:12:59.680303       1 scheduler.go:196] "getting next qemu from scheduling queue" Name="qemu-scheduler" schedulerID=0xc00011d4c8
I1110 22:12:59.680226       1 storage.go:40] "finding available storage" controller="proxmoxmachine" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="ProxmoxMachine" ProxmoxMachine="default/cappx-md-0-jbbzl" namespace="default" name="cappx-md-0-jbbzl" reconcileID=a077cb8c-4203-4138-96db-aaa9b5b3c3e6
I1110 22:12:59.684684       1 qemu.go:56] "making qemu spec" controller="proxmoxmachine" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="ProxmoxMachine" ProxmoxMachine="default/cappx-md-0-jbbzl" namespace="default" name="cappx-md-0-jbbzl" reconcileID=a077cb8c-4203-4138-96db-aaa9b5b3c3e6
I1110 22:12:59.684717       1 scheduler.go:261] "adding qemu to scheduler queue" Name="qemu-scheduler" schedulerID=0xc00011d4c8 qemu="cappx-md-0-jbbzl"
I1110 22:12:59.684841       1 scheduler.go:203] "scheduling qemu" Name="qemu-scheduler" schedulerID=0xc00011d4c8 qemu="cappx-md-0-jbbzl"
I1110 22:12:59.684867       1 scheduler.go:280] "finding proxmox node matching qemu" Name="qemu-scheduler" schedulerID=0xc00011d4c8 qemu="cappx-md-0-jbbzl"
I1110 22:12:59.689759       1 scheduler.go:327] "filtering proxmox node" Name="qemu-scheduler" schedulerID=0xc00011d4c8 qemu="cappx-md-0-jbbzl"
I1110 22:12:59.723032       1 scheduler.go:350] "scoring proxmox node" Name="qemu-scheduler" schedulerID=0xc00011d4c8 qemu="cappx-md-0-jbbzl"
E1110 22:12:59.748318       1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 168 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x1793580?, 0x28af360})
    /go/pkg/mod/k8s.io/apimachinery@v0.27.2/pkg/util/runtime/runtime.go:75 +0x99
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc0000a8740?})
    /go/pkg/mod/k8s.io/apimachinery@v0.27.2/pkg/util/runtime/runtime.go:49 +0x75
panic({0x1793580, 0x28af360})
    /usr/local/go/src/runtime/panic.go:884 +0x213
github.com/sp-yduck/cluster-api-provider-proxmox/cloud/scheduler/framework.(*Status).IsSuccess(...)
    /workspace/cloud/scheduler/framework/types.go:45
github.com/sp-yduck/cluster-api-provider-proxmox/cloud/scheduler.(*Scheduler).RunScorePlugins(_, {_, _}, _, {0x0, {0x0, 0x0}, {0x1999141, 0x9}, {0x0, ...}, ...}, ...)
    /workspace/cloud/scheduler/scheduler.go:361 +0x409
github.com/sp-yduck/cluster-api-provider-proxmox/cloud/scheduler.(*Scheduler).SelectNode(_, {_, _}, {0x0, {0x0, 0x0}, {0x1999141, 0x9}, {0x0, 0x0}, ...})
    /workspace/cloud/scheduler/scheduler.go:298 +0x20e
github.com/sp-yduck/cluster-api-provider-proxmox/cloud/scheduler.(*Scheduler).ScheduleOne(0xc000192320, {0x1c14bf8, 0xc0002f14a0})
    /workspace/cloud/scheduler/scheduler.go:210 +0x2c5
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1()
    /go/pkg/mod/k8s.io/apimachinery@v0.27.2/pkg/util/wait/backoff.go:259 +0x25
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x30?)
    /go/pkg/mod/k8s.io/apimachinery@v0.27.2/pkg/util/wait/backoff.go:226 +0x3e
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc000532480?, {0x1bfe800, 0xc0008c21b0}, 0x1, 0xc000532480)
    /go/pkg/mod/k8s.io/apimachinery@v0.27.2/pkg/util/wait/backoff.go:227 +0xb6
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc0002f14a0?, 0x0, 0x0, 0x0?, 0x19acf5c?)
    /go/pkg/mod/k8s.io/apimachinery@v0.27.2/pkg/util/wait/backoff.go:204 +0x89
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext({0x1c14bf8, 0xc0002f14a0}, 0xc0006dffa0, 0x19acf5c?, 0x17?, 0x0?)
    /go/pkg/mod/k8s.io/apimachinery@v0.27.2/pkg/util/wait/backoff.go:259 +0x99
k8s.io/apimachinery/pkg/util/wait.UntilWithContext(...)
    /go/pkg/mod/k8s.io/apimachinery@v0.27.2/pkg/util/wait/backoff.go:170
github.com/sp-yduck/cluster-api-provider-proxmox/cloud/scheduler.(*Scheduler).Run(0xc000192320)
    /workspace/cloud/scheduler/scheduler.go:174 +0xfe
created by github.com/sp-yduck/cluster-api-provider-proxmox/cloud/scheduler.(*Scheduler).RunAsync
    /workspace/cloud/scheduler/scheduler.go:184 +0x56
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
    panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x15dc789]

goroutine 168 [running]:
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc0000a8740?})
    /go/pkg/mod/k8s.io/apimachinery@v0.27.2/pkg/util/runtime/runtime.go:56 +0xd7
panic({0x1793580, 0x28af360})
    /usr/local/go/src/runtime/panic.go:884 +0x213
github.com/sp-yduck/cluster-api-provider-proxmox/cloud/scheduler/framework.(*Status).IsSuccess(...)
    /workspace/cloud/scheduler/framework/types.go:45
github.com/sp-yduck/cluster-api-provider-proxmox/cloud/scheduler.(*Scheduler).RunScorePlugins(_, {_, _}, _, {0x0, {0x0, 0x0}, {0x1999141, 0x9}, {0x0, ...}, ...}, ...)
    /workspace/cloud/scheduler/scheduler.go:361 +0x409
github.com/sp-yduck/cluster-api-provider-proxmox/cloud/scheduler.(*Scheduler).SelectNode(_, {_, _}, {0x0, {0x0, 0x0}, {0x1999141, 0x9}, {0x0, 0x0}, ...})
    /workspace/cloud/scheduler/scheduler.go:298 +0x20e
github.com/sp-yduck/cluster-api-provider-proxmox/cloud/scheduler.(*Scheduler).ScheduleOne(0xc000192320, {0x1c14bf8, 0xc0002f14a0})
    /workspace/cloud/scheduler/scheduler.go:210 +0x2c5
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1()
    /go/pkg/mod/k8s.io/apimachinery@v0.27.2/pkg/util/wait/backoff.go:259 +0x25
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x30?)
    /go/pkg/mod/k8s.io/apimachinery@v0.27.2/pkg/util/wait/backoff.go:226 +0x3e
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc000532480?, {0x1bfe800, 0xc0008c21b0}, 0x1, 0xc000532480)
    /go/pkg/mod/k8s.io/apimachinery@v0.27.2/pkg/util/wait/backoff.go:227 +0xb6
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc0002f14a0?, 0x0, 0x0, 0x0?, 0x19acf5c?)
    /go/pkg/mod/k8s.io/apimachinery@v0.27.2/pkg/util/wait/backoff.go:204 +0x89
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext({0x1c14bf8, 0xc0002f14a0}, 0xc0006dffa0, 0x19acf5c?, 0x17?, 0x0?)
    /go/pkg/mod/k8s.io/apimachinery@v0.27.2/pkg/util/wait/backoff.go:259 +0x99
k8s.io/apimachinery/pkg/util/wait.UntilWithContext(...)
    /go/pkg/mod/k8s.io/apimachinery@v0.27.2/pkg/util/wait/backoff.go:170
github.com/sp-yduck/cluster-api-provider-proxmox/cloud/scheduler.(*Scheduler).Run(0xc000192320)
    /workspace/cloud/scheduler/scheduler.go:174 +0xfe
created by github.com/sp-yduck/cluster-api-provider-proxmox/cloud/scheduler.(*Scheduler).RunAsync
    /workspace/cloud/scheduler/scheduler.go:184 +0x56
cloudInit:
  user:
    packages:
      - socat
      - conntrack
    runCmd:
      - modprobe overlay
      - modprobe br_netfilter
      - sysctl --system
      - mkdir -p /usr/local/bin
      - curl -L "https://github.com/containerd/containerd/releases/download/v1.7.2/containerd-1.7.2-linux-amd64.tar.gz" | tar Cxvz "/usr/local"
      - curl -L "https://raw.githubusercontent.com/containerd/containerd/main/containerd.service" -o /etc/systemd/system/containerd.service
      - mkdir -p /etc/containerd
      - containerd config default > /etc/containerd/config.toml
      - sed 's/SystemdCgroup = false/SystemdCgroup = true/g' /etc/containerd/config.toml -i
      - systemctl daemon-reload
      - systemctl enable --now containerd
      - mkdir -p /usr/local/sbin
      - curl -L "https://github.com/opencontainers/runc/releases/download/v1.1.7/runc.amd64" -o /usr/local/sbin/runc
      - chmod 755 /usr/local/sbin/runc
      - mkdir -p /opt/cni/bin
      - curl -L "https://github.com/containernetworking/plugins/releases/download/v1.3.0/cni-plugins-linux-amd64-v1.3.0.tgz" | tar -C "/opt/cni/bin" -xz
      - curl -L "https://github.com/kubernetes-sigs/cri-tools/releases/download/v1.27.0/crictl-v1.27.0-linux-amd64.tar.gz" | tar -C "/usr/local/bin" -xz
      - curl -L --remote-name-all https://dl.k8s.io/release/v1.27.3/bin/linux/amd64/kubeadm -o /usr/local/bin/kubeadm
      - chmod +x /usr/local/bin/kubeadm
      - curl -L --remote-name-all https://dl.k8s.io/release/v1.27.3/bin/linux/amd64/kubelet -o /usr/local/bin/kubelet
      - chmod +x /usr/local/bin/kubelet
      - curl -sSL "https://raw.githubusercontent.com/kubernetes/release/v0.15.1/cmd/kubepkg/templates/latest/deb/kubelet/lib/systemd/system/kubelet.service" | sed "s:/usr/bin:/usr/local/bin:g" | tee /etc/systemd/system/kubelet.service
      - mkdir -p /etc/systemd/system/kubelet.service.d
      - curl -sSL "https://raw.githubusercontent.com/kubernetes/release/v0.15.1/cmd/kubepkg/templates/latest/deb/kubeadm/10-kubeadm.conf" | sed "s:/usr/bin:/usr/local/bin:g" | tee /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
      - systemctl enable kubelet.service
    writeFiles:
      - content: overlay\nbr_netfilter
        owner: root:root
        path: /etc/modules-load.d/k8s.conf
        permissions: "0640"
      - content: |
          net.bridge.bridge-nf-call-iptables  = 1
          net.bridge.bridge-nf-call-ip6tables = 1
          net.ipv4.ip_forward                 = 1
        owner: root:root
        path: /etc/sysctl.d/k8s.conf
        permissions: "0640"
hardware:
  cpu: 4
  disk: 50G
  memory: 8192
  networkDevice:
    bridge: vmbr0
    firewall: true
    model: virtio
image:
  checksum: c5eed826009c9f671bc5f7c9d5d63861aa2afe91aeff1c0d3a4cb5b28b2e35d6
  checksumType: sha256
  url: https://cloud-images.ubuntu.com/releases/jammy/release-20230914/ubuntu-22.04-server-cloudimg-amd64-disk-kvm.img
3deep5me commented 8 months ago

Do you have any special hardware or so? (The k3s Cluster). (ARM, Hugepages, old hardware, os) I had some strange behavior which I could not explain with hugepages. It's just guessing but may he a hint in a direction. If you want you can share you node specs. kubectl get nodes -o yaml

sp-yduck commented 8 months ago

Hi @dohq , I fixed one suspicious codes (#134) and pushed the image to spyduck/cluster-api-provider-proxmox:9baaef0. I'm not 100% sure though could you try it and see how it works ? Currently I am testing with single node proxmox cluster usually. So some codes are not well tested yet. Sorry for the inconvenience and again thank you for reporting !

sp-yduck commented 8 months ago

@3deep5me thank you for sharing. interesting.. which component (proxmox ve, cappx, kubernetes) did you see that some strange behavior ?

3deep5me commented 8 months ago

@sp-yduck sorry for confusing it was not with Cappx.