Closed dgoodwin closed 6 years ago
Also disables the check for deprovisioning remote machinesets as we do not currently create them, and this blocks the infra deprovision job and thus cluster deletion.
@dgoodwin Could you go into more detail about this? I understand that if we are not creating them, then there is no need to deprovision them. What is blocking though? I would hope that anything waiting for a deprovision would not be held up if there was no corresponding provision. If we are attempting a provision, but it is failing, then I would hope that the deprovision would succeed. I just want to make sure that we are not covering up a resiliency issue that may present itself later and be more difficult to ascertain.
This happens on every machine sync, the API call appears idempotent, however if we feel this is too much chatter I could modify to only re-try adding to the ELB after a certain amount of time has elapsed since successfully being added.
This is definitely going to be too much chatter. Making unnecessary remote calls to AWS is not going to scale, even if the calls are idempotent. I am fine if we want to leave it as is for now, but it is something that we are going to have to address at some point.
@staebler the boolean for "deprovisioned machine sets" we use in the infra controller CanUndo function is not hooked up yet, we don't create them and we don't have anything live that will try to delete them and set the flag true, at least until Joel's PR goes in. Right now capi.Cluster can never delete as a result which hampers testing, so it seems like until Joel's work is ready we should let infra deprovision run and this capi.Cluster to get cleaned up.
As for the chatter I should probably do something similar to the jobs that rerun on a staggered interval, store generation + timestamp in the machine status as optional fields. (only relevant for master machines) Might be best as a followup though as this will help unblock Joel and let him test for real.
Updated for most of feedback not being deferred to followup.
I removed the actual sorting, instead SortInstances now returns the active instance and a list of all others as explicit return args, I prefer this to assumptions about positions in a single array.
Haven't exposed errors from the ELB add call yet, that may be better suited for a larger conversation and followup PR.
/lgtm
Watches machines, and if they're masters adds them to internal/external ELB. This happens on every machine sync, the API call appears idempotent, however if we feel this is too much chatter I could modify to only re-try adding to the ELB after a certain amount of time has elapsed since successfully being added.
Also disables the check for deprovisioning remote machinesets as we do not currently create them, and this blocks the infra deprovision job and thus cluster deletion.