tinkerbell / rufio

Kubernetes Controller for BMC Interactions
Apache License 2.0
36 stars 16 forks source link

reconcile failures with IntelAMT driver #76

Closed ibrokethecloud closed 1 year ago

ibrokethecloud commented 1 year ago

During machine reconcile operations, there are random crashes due to the IntelAMT driver.

    /Users/gauravmehta/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.13.1/pkg/internal/controller/controller.go:118 +0x1a0
panic({0x102253280, 0x103013ab0})
    /usr/local/go/src/runtime/panic.go:884 +0x204
github.com/jacobweinstock/go-amt.getPowerStatus({0x10247a298, 0x14002baf860}, 0x0?)
    /Users/gauravmehta/go/pkg/mod/github.com/jacobweinstock/go-amt@v0.0.0-20221125040441-53475f4ae023/power.go:50 +0x24
github.com/jacobweinstock/go-amt.isPoweredOn({0x10247a298?, 0x14002baf860?}, 0x0)
    /Users/gauravmehta/go/pkg/mod/github.com/jacobweinstock/go-amt@v0.0.0-20221125040441-53475f4ae023/power.go:140 +0x28
github.com/jacobweinstock/go-amt.(*Client).IsPoweredOn(0x14001815368?, {0x10247a298?, 0x14002baf860?})
    /Users/gauravmehta/go/pkg/mod/github.com/jacobweinstock/go-amt@v0.0.0-20221125040441-53475f4ae023/client.go:69 +0x34
github.com/bmc-toolbox/bmclib/v2/providers/intelamt.(*Conn).PowerStateGet(0x1021d3aa0?, {0x10247a298?, 0x14002baf860?})
    /Users/gauravmehta/go/pkg/mod/github.com/bmc-toolbox/bmclib/v2@v2.0.1-0.20230106151741-828737c08f6e/providers/intelamt/intelamt.go:136 +0x30
github.com/bmc-toolbox/bmclib/v2/bmc.getPowerState({0x10247a298, 0x14002baf860}, {0x1400308d9e0?, 0x1, 0x1?})
    /Users/gauravmehta/go/pkg/mod/github.com/bmc-toolbox/bmclib/v2@v2.0.1-0.20230106151741-828737c08f6e/bmc/power.go:102 +0x22c
github.com/bmc-toolbox/bmclib/v2/bmc.GetPowerStateFromInterfaces({0x10247a298?, 0x14002baf860?}, {0x140004bb1d0?, 0x1?, 0x140028d57c0?})
    /Users/gauravmehta/go/pkg/mod/github.com/bmc-toolbox/bmclib/v2@v2.0.1-0.20230106151741-828737c08f6e/bmc/power.go:131 +0x170
github.com/bmc-toolbox/bmclib/v2.(*Client).GetPowerState(0x14003d9bb90, {0x10247a298, 0x14002baf860})
    /Users/gauravmehta/go/pkg/mod/github.com/bmc-toolbox/bmclib/v2@v2.0.1-0.20230106151741-828737c08f6e/client.go:184 +0x168
github.com/tinkerbell/rufio/controllers.(*MachineReconciler).reconcilePower(0x1400275c680, {0x10247a298?, 0x14002baf860?}, 0x14001ea5380, {0x10247a3b0?, 0x14003d9bb90?})
    /Users/gauravmehta/go/pkg/mod/github.com/tinkerbell/rufio@v0.2.0/controllers/machine_controller.go:154 +0x40
github.com/tinkerbell/rufio/controllers.(*MachineReconciler).reconcile(0x1400275c680, {0x10247a298, 0x14002baf860}, 0x14001ea5380, {0x10246cd90, 0x14002baf8f0}, {{0x10247c8e8?, 0x14002baf890?}, 0x14001ea5380?})
    /Users/gauravmehta/go/pkg/mod/github.com/tinkerbell/rufio@v0.2.0/controllers/machine_controller.go:137 +0x4b4
github.com/tinkerbell/rufio/controllers.(*MachineReconciler).Reconcile(0x1400275c680, {0x10247a298, 0x14002baf860}, {{{0x14003e0af80, 0x10}, {0x14003e0af70, 0xf}}})
    /Users/gauravmehta/go/pkg/mod/github.com/tinkerbell/rufio@v0.2.0/controllers/machine_controller.go:96 +0x294
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0x10247a1f0?, {0x10247a298?, 0x14002baf860?}, {{{0x14003e0af80?, 0x10237fec0?}, {0x14003e0af70?, 0x89f6bff4b698d063?}}})
    /Users/gauravmehta/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.13.1/pkg/internal/controller/controller.go:121 +0x8c
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0x14000714320, {0x10247a1f0, 0x140009b1780}, {0x1022b1da0?, 0x14000444060?})
    /Users/gauravmehta/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.13.1/pkg/internal/controller/controller.go:320 +0x2a4
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0x14000714320, {0x10247a1f0, 0x140009b1780})
    /Users/gauravmehta/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.13.1/pkg/internal/controller/controller.go:273 +0x1b0
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()
    /Users/gauravmehta/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.13.1/pkg/internal/controller/controller.go:234 +0x74
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2
    /Users/gauravmehta/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.13.1/pkg/internal/controller/controller.go:230 +0x28c

Expected Behaviour

No nil pointer exceptions

Current Behaviour

Possible Solution

Steps to Reproduce (for bugs)

  1. Run rufio
  2. Define a non intel AMT machine

After a while during the reconcile process the failure is observed.

Context

Your Environment

jacobweinstock commented 1 year ago

Hey @ibrokethecloud, thanks for reporting this. I opened https://github.com/bmc-toolbox/bmclib/pull/304 to resolve this. Once it's merged I'll update the bmclib dependency here.