Closed utkuozdemir closed 3 weeks ago
And maybe we should add a unit-test in the controller that will verify that there's no bug anymore
LGTM.
But the fact that the machines running Talos 1.7.2 have no secure boot status information also looks like a bug.
They actually have the status set correctly on the machinestatus - the generation error we see is obsolete/outdated.
They actually have the status set correctly on the machinestatus - the generation error we see is obsolete/outdated
But why it's Unknown? Should it be disabled instead?
I think there's some bug in the JS code.
The response omits false
fields by default, so that might lead to the incorrect results in the UI.
They actually have the status set correctly on the machinestatus - the generation error we see is obsolete/outdated
But why it's Unknown? Should it be disabled instead?
I think there's some bug in the JS code.
Makes sense.
So, the MachineStatus seems completely right:
securebootstatus:
enabled: false
But the ClusterMachineConfig
is not aware of it.
I think I just found another issue here - here, for the obsolete error to remain, the change in MachineStatus
(the change that sets the securebootstatus
to be not-nil and enabled: false
) should not trigger another generation of config. I think it also happened here, because before generating a new config, we calculate the input versions, and if there is no change in the versions, we just skip it. And while doing it, it seems we don't take MachineStatus
into consideration.
Question is, if I add it to the input versions calculation, will it trigger too many config generations?
Question is, if I add it to the input versions calculation, will it trigger too many config generations?
I guess it will, but every update with the same state of the secure boot will be no-op.
I would actually prefer to get rid of the MachineStatus
in this controller.
It would be better to split secure boot and schematic parts to a separate resource.
Question is, if I add it to the input versions calculation, will it trigger too many config generations?
I guess it will, but every update with the same state of the secure boot will be no-op. I would actually prefer to get rid of the
MachineStatus
in this controller. It would be better to split secure boot and schematic parts to a separate resource.
Yep, I got a similar idea now - we copy over the secure boot status into ClusterMachineTalosVersion
, just like we do with the schamatic ID. And here we read it from that instead.
@Unix4ever I ended up using ClusterMachineTalosVersion
for now, as it already has a working flow. Avoided adding a new resource into the mix atm which potentially cause some other parts to break.
I think we can pair and do a bigger rework later to simplify the whole machine customization flow.
We might need to do migration here to avoid triggering the config apply.
And if we rework this later we'll have to do another migration.
As both changes are no-op but change the inputs version, the config will get regenerated in the controllers if we don't do anything.
We might need to do migration here to avoid triggering the config apply.
And if we rework this later we'll have to do another migration.
As both changes are no-op but change the inputs version, the config will get regenerated in the controllers if we don't do anything.
Done as discussed, including the migration. Please have another look so we don't miss anything.
@Unix4ever ended up adding it into MachineConfigGenOptions
which simplified things a lot.
/m
/m
A change in the
SecureBootStatus
in theMachineStatus
did not cause a new config to be generated. This was caused by theClusterMachineConfigController
not taking theMachineStatus
into the calculation of the hash from the inputs (UpdateInputsVersions
) to determine if it should generate a new config or not.Modify the existing
MachineConfigGenOptions
resource to hold the required information to build the install image. Instead of usingMachineStatus+ClusterMachineTalosVersion
to build the install image URL, use onlyMachineConfigGenOptions
resource.Additionally, add a migration to pre-populate the new fields in the
MachineConfigGenOptions
resources, so that they will not be modified on the next Omni upgrade. Also update the input versions of theClusterMachineConfig
to not containClusterMachineTalosVersion
anymore, so that we won't trigger a config re-apply on the next Omni upgrade.