tinkerbell / rufio

Kubernetes Controller for BMC Interactions
Apache License 2.0
35 stars 16 forks source link

bmclib client to filter for compatible drivers #9

Open joelrebel opened 2 years ago

joelrebel commented 2 years ago

Raising this as an issue to point out, it would be ideal if this used the FilterForCompatible method here https://github.com/tinkerbell/rufio/blob/13eef9a71caf1171e8f7aae2c85ffbb91379df9b/controllers/bmcclient.go#L19

This way the BMC is 'probed' for compatibility and only drivers known to be working with the BMC will be attempted when the caller invokes the client methods.

drivers := client.Registry.FilterForCompatible(ctx)
if len(drivers) == 0 {
  return nil, errors.New("no compatible bmclib drivers found")
}

client.Registry.Drivers = drivers

I understand the redfish driver is hardcoded at the moment, so whenever that is changed I suggest using the filter method above.

chrisdoherty4 commented 2 years ago

@pokearu

chrisdoherty4 commented 2 years ago

@joelrebel @jacobweinstock Am I right in saying FilterForCompatible() would probe every time its called? We create a client every time an object is reconciled so we'd need to go through that probing process each time - is that feasible?

joelrebel commented 2 years ago

@chrisdoherty4 yep, FilterForCompatible() will attempt to check BMC compatibility by connecting to the BMC when invoked. At what frequency is the object going to be reconciled?

In my experience BMCs are tend to behave erratically as their uptime goes higher, or are hammered with a lot of requests. I'd suggest the reconciliation frequency be kept lower, if thats feasible.

chrisdoherty4 commented 2 years ago

@joelrebel It heavily depends on whats happening. If users submit a series of jobs then individual tasks will create a new client every time they're reconciled.

In the context of Tinkerbell (CAPT specifically), its probably not that often because once a machine is provisioned we no longer use BMC. That said, Rufio could be used by other things that execute jobs more frequently and we'd continually establish new connections, probe and execute.

We could, longer term, think about a client cache given controllers share the same process space. For now, I think we can proceed with FilterForCompatible() and accept the inefficiency (premature optimization n'all) - does that sound reasonable?

joelrebel commented 2 years ago

@chrisdoherty4 a suggestion would be to having each BMC guarded under a lock so theres just one job acting on a BMC to completion at any given moment. For example, a power reset while the BMC configuration is being updated would have a negative impact on the current BMC configuration, or a power reset during a firmware update. Is this sort of BMC lock feasible?

The client cache sounds fair, a cache along with a bmclib client option to provide "hints" would work - the older client has such an implemention. This could then mean FilterForCompaible() would be performed on the first connect and in subsequent requests, "hints" are used instead.

chrisdoherty4 commented 2 years ago

@joelrebel The locking suggestion thing tackles a different problem - I was suggesting a user could submit jobs in serial, they execute in serial, and it would connect and run the filtering algorithm every time (i.e. its dependent on use).

However, your point on controlling whats executing on the BMC is understood; I don't think that's something we're interested in solving though. Thats something the BMC implementation really needs to guard against, not external entities given no 2 entities are necessarily in sync (for example, 2 human operators or 1 human and 1 machine).