openshift / instaslice-operator

InstaSlice Operator facilitates slicing of accelerators using stable APIs
Apache License 2.0
8 stars 12 forks source link

Daemonset crashes on the hosts without GPU #183

Open harche opened 4 days ago

harche commented 4 days ago

Installed instaslice on OCP 4.17 using make ocp-deploy. The daemonset crashed on the nodes without any GPUs,

$ oc get pods -A | grep instas
instaslice-system                                  instaslice-operator-controller-daemonset-g8vx4                    0/1     CrashLoopBackOff   7 (88s ago)    12m
instaslice-system                                  instaslice-operator-controller-daemonset-jxshm                    1/1     Running            0              12m
instaslice-system                                  instaslice-operator-controller-daemonset-l2rg7                    0/1     CrashLoopBackOff   7 (99s ago)    12m
instaslice-system                                  instaslice-operator-controller-daemonset-sbcs2                    0/1     CrashLoopBackOff   7 (90s ago)    12m
instaslice-system                                  instaslice-operator-controller-manager-5bbfb4fd47-x2vzc           2/2     Running            0              13m

with error,

{"level":"info","ts":"2024-10-17T19:35:54.953910209Z","caller":"controller/instaslice_daemonset.go:554","msg":"classical resources obtained are ","cpu":144,"memory":2163215945728}
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x138 pc=0x1717a92]

goroutine 212 [running]:
github.com/openshift/instaslice-operator/internal/controller.(*InstaSliceDaemonsetReconciler).discoverMigEnabledGpuWithSlices(0xc0009042d0)
    /workspace/internal/controller/instaslice_daemonset.go:555 +0x232
github.com/openshift/instaslice-operator/internal/controller.(*InstaSliceDaemonsetReconciler).SetupWithManager.func1({0x1df5490, 0xc0005aebe0})
    /workspace/internal/controller/instaslice_daemonset.go:478 +0x1d3
sigs.k8s.io/controller-runtime/pkg/manager.RunnableFunc.Start(0x1df5490?, {0x1df5490?, 0xc0005aebe0?})
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.2/pkg/manager/manager.go:301 +0x26
sigs.k8s.io/controller-runtime/pkg/manager.(*runnableGroup).reconcile.func1(0xc0006d3400)
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.2/pkg/manager/runnable_group.go:223 +0xc8
created by sigs.k8s.io/controller-runtime/pkg/manager.(*runnableGroup).reconcile in goroutine 232
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.2/pkg/manager/runnable_group.go:207 +0x19d
sairameshv commented 3 days ago

/assign