Open harche opened 4 days ago
Installed instaslice on OCP 4.17 using make ocp-deploy. The daemonset crashed on the nodes without any GPUs,
make ocp-deploy
$ oc get pods -A | grep instas instaslice-system instaslice-operator-controller-daemonset-g8vx4 0/1 CrashLoopBackOff 7 (88s ago) 12m instaslice-system instaslice-operator-controller-daemonset-jxshm 1/1 Running 0 12m instaslice-system instaslice-operator-controller-daemonset-l2rg7 0/1 CrashLoopBackOff 7 (99s ago) 12m instaslice-system instaslice-operator-controller-daemonset-sbcs2 0/1 CrashLoopBackOff 7 (90s ago) 12m instaslice-system instaslice-operator-controller-manager-5bbfb4fd47-x2vzc 2/2 Running 0 13m
with error,
{"level":"info","ts":"2024-10-17T19:35:54.953910209Z","caller":"controller/instaslice_daemonset.go:554","msg":"classical resources obtained are ","cpu":144,"memory":2163215945728} panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x138 pc=0x1717a92] goroutine 212 [running]: github.com/openshift/instaslice-operator/internal/controller.(*InstaSliceDaemonsetReconciler).discoverMigEnabledGpuWithSlices(0xc0009042d0) /workspace/internal/controller/instaslice_daemonset.go:555 +0x232 github.com/openshift/instaslice-operator/internal/controller.(*InstaSliceDaemonsetReconciler).SetupWithManager.func1({0x1df5490, 0xc0005aebe0}) /workspace/internal/controller/instaslice_daemonset.go:478 +0x1d3 sigs.k8s.io/controller-runtime/pkg/manager.RunnableFunc.Start(0x1df5490?, {0x1df5490?, 0xc0005aebe0?}) /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.2/pkg/manager/manager.go:301 +0x26 sigs.k8s.io/controller-runtime/pkg/manager.(*runnableGroup).reconcile.func1(0xc0006d3400) /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.2/pkg/manager/runnable_group.go:223 +0xc8 created by sigs.k8s.io/controller-runtime/pkg/manager.(*runnableGroup).reconcile in goroutine 232 /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.2/pkg/manager/runnable_group.go:207 +0x19d
/assign
Installed instaslice on OCP 4.17 using
make ocp-deploy
. The daemonset crashed on the nodes without any GPUs,with error,