openshift / instaslice-operator

InstaSlice Operator facilitates slicing of accelerators using stable APIs
Apache License 2.0
15 stars 12 forks source link

Daemonset crashes on the hosts without GPU #183

Closed harche closed 4 weeks ago

harche commented 1 month ago

Installed instaslice on OCP 4.17 using make ocp-deploy. The daemonset crashed on the nodes without any GPUs,

$ oc get pods -A | grep instas
instaslice-system                                  instaslice-operator-controller-daemonset-g8vx4                    0/1     CrashLoopBackOff   7 (88s ago)    12m
instaslice-system                                  instaslice-operator-controller-daemonset-jxshm                    1/1     Running            0              12m
instaslice-system                                  instaslice-operator-controller-daemonset-l2rg7                    0/1     CrashLoopBackOff   7 (99s ago)    12m
instaslice-system                                  instaslice-operator-controller-daemonset-sbcs2                    0/1     CrashLoopBackOff   7 (90s ago)    12m
instaslice-system                                  instaslice-operator-controller-manager-5bbfb4fd47-x2vzc           2/2     Running            0              13m

with error,

{"level":"info","ts":"2024-10-17T19:35:54.953910209Z","caller":"controller/instaslice_daemonset.go:554","msg":"classical resources obtained are ","cpu":144,"memory":2163215945728}
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x138 pc=0x1717a92]

goroutine 212 [running]:
github.com/openshift/instaslice-operator/internal/controller.(*InstaSliceDaemonsetReconciler).discoverMigEnabledGpuWithSlices(0xc0009042d0)
    /workspace/internal/controller/instaslice_daemonset.go:555 +0x232
github.com/openshift/instaslice-operator/internal/controller.(*InstaSliceDaemonsetReconciler).SetupWithManager.func1({0x1df5490, 0xc0005aebe0})
    /workspace/internal/controller/instaslice_daemonset.go:478 +0x1d3
sigs.k8s.io/controller-runtime/pkg/manager.RunnableFunc.Start(0x1df5490?, {0x1df5490?, 0xc0005aebe0?})
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.2/pkg/manager/manager.go:301 +0x26
sigs.k8s.io/controller-runtime/pkg/manager.(*runnableGroup).reconcile.func1(0xc0006d3400)
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.2/pkg/manager/runnable_group.go:223 +0xc8
created by sigs.k8s.io/controller-runtime/pkg/manager.(*runnableGroup).reconcile in goroutine 232
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.2/pkg/manager/runnable_group.go:207 +0x19d
sairameshv commented 1 month ago

/assign

harche commented 4 weeks ago

fixed by https://github.com/openshift/instaslice-operator/pull/194

/close

openshift-ci[bot] commented 4 weeks ago

@harche: Closing this issue.

In response to [this](https://github.com/openshift/instaslice-operator/issues/183#issuecomment-2435342861): >fixed by https://github.com/openshift/instaslice-operator/pull/194 > >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.