Closed mkjpryor closed 1 month ago
I knew OCP seems has such by default, are you suggesting we should default it in our template or in the code directly?
@jichenjc
IMHO, it would be nice if we could manage a server group for the control plane nodes. It would be better for the "Infrastructure-as-Code" approach to have the whole thing defined in Kubernetes resources, rather than having to make the server group ahead of time.
We should probably at least document it in the CAPO book as well.
yes, we have such definition in api/v1alpha5/openstackmachine_types.go
// The server group to assign the machine to
ServerGroupID string `json:"serverGroupID,omitempty"`
I think maybe we can refer to other impl, like sec group we did, a bool var to control whether we create for customer if the bool is true, we create server group and honor the ID to each control plane node if it's false, let customer decide whether they need such .. and during deletion phase, delete the server group all together at last
For the control plane we certainly have the opportunity to create and manage this server group: the server group will have the same lifecycle as the cluster, so it can be managed with the cluster.
For machines in a machineset it is not currently possible to do this in CAPI as far as I'm aware. I have previously mooted the possibility of some CAPI object whose lifecycle spans multiple machines. The best suggestion so far is something along the lines of MachinePools, however this would be an abuse of this interface IMHO as OpenStack has no such concept. An object with the same lifecycle as a MachineSet would probably be sufficient: you'd need a new machineset for every server group, but this doesn't sound unreasonable to me.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/reopen
/remove-lifecycle rotten
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
I'd like to re-open this issue and help implement this.
We are a cloud provider working on implementing OpenStack Magnum managed CAPO clusters and these must have anti-affinity
applied to control plane nodes. (Use case: Region with a single AZ).
The preferred defaults would be anti-affinity
for control plane to ensure separation and soft-anti-affinity
for machines to allow scaling beyond available compute hypervisors.
Proposal:
ServerGroupAffinity string json:"serverGroupAffinity,omitempty"
to OpenStackClusterSpec
with valid values of anti-affinity
, soft-anti-affinity
(affinity
, soft-affinity
are valid in Nova, but not sensible here)Note: This would need to apply to control plane nodes only, so perhaps it should be named with a CP
prefix?
I note the restrictions of CAPI in creating the servergroup for MachineSets. Perhaps this is not in the first pass, but if we had an available ServerGroupAffinity
for the MachineSet
accessible we could create on the first Machine if it doesn't exist, and clean up if we remove the last machine from it. It would leave orphaned servergroups in some cases.
/reopen
@dalees: You can't reopen an issue/PR unless you authored it or you are a collaborator.
/reopen
@jichenjc: Reopened this issue.
@dalees Using pre-existing server groups is possible in the Helm charts already, so we could actually implement this fully in the Magnum driver if required.
I still think it would be nice to do in CAPO though.
For the CP nodes it is fairly obvious that we can just add a control plane affinity policy to OpenStackCluster
and manage a server group for the CP nodes.
For the workers it is less obvious, as what you probably want is a server group per MD, but we only get to deal in machines in CAPO, and we aren’t supposed to care which MD they are in.
@mdbooth Didn’t you have ideas for reworking all the AZ logic? Maybe server groups could be worked into that?
Server groups won't live in the AZ logic when that (hopefully) happens. The reason is that AZs are basically a templating mechanism: they define a set of things which vary between machines. Server groups don't (must not) vary between members.
I would like to see us support server groups better than we do currently, which as Matt says is to just create one externally and reference it in the machine spec. I believe we will need a solution for this soon for OpenShift integration, so it will likely be on my TODO list... some time in the next 6 months, maybe? If somebody else wanted to implement it sooner I'd be delighted.
I don't want to add any more 'cluster-magic' to the machine controller. i.e. My preference is that the machine spec entirely specifies the machine, without reference to the cluster spec. So I'd be happy for the cluster controller to create the server group, I'd still want the control plane machine spec to reference it explicitly.
So OpenStackCluster
would have a list of server groups to create, which are then referenced from OpenStackMachine
?
Could this be a good time to have an extra CRD for an OpenStackServerGroup
maybe?
/remove-lifecycle rotten
@mkjpryor @dalees
Hey, I work with @mdbooth and we will need this one to be addressed soon so I'm very likely going to work on that.
A few questions:
OpenStackServerGroup
? I think it's a great idea and indeed an opportunity to do it well in the first place.ServerGroupID
, however when used CAPO wouldn't manage OpenStackServerGroup
.End goal: CAPO needs to be able to create and manage the Server Groups (at a minimum for the Control Plane).
I would like to see support for worker nodes as well if possible ( or a design which allows support for it to be added easily on a later stage ).
currently, we have clusters that:
most clusters follow the 2 pattern but there are some use cases where 1 or 3 is preferable.
@EmilienM - Great, if you'd like to take this forward and progress the CRD and controller for this, that'd be welcomed.
I am prepared to write the control plane part of this, but not ready to dedicate the time to write the CRD/Controller proposal and implementation. I may do the control plane part for my own learning and add a PR in the next few weeks. I don't expect this to be merged if a CRD based replacement is in the pipeline :)
Happy to follow, feedback and test. Having this functional for node groups would be great.
I can see a good use case for a different server group per node group, matching @nikParasyr 's case 3.
@EmilienM
Personally, I really like the idea of an OpenStackServerGroup
CRD, with the corresponding OpenStackMachine.spec.serverGroupRef
field. This would make managing a server group per MD a breeze using our Helm charts.
Just to clarify, I think users of CAPO should create instances of the OpenStackServerGroup
CRD and assign them to machines using a spec.serverGroupRef
, rather than CAPO managing the creation of OpenStackServerGroup
CRD instances itself.
So the order of precedence for a machine would be:
spec.serverGroupID
is set, use that server groupspec.serverGroupRef
is set, use the server group created for the CRD instanceCAPO will manage the actual OpenStack security group backing an OpenStackServerGroup
CRD instance but will not create them automatically for clusters that do not request them (not every cluster will need or want server groups).
Does that make sense? @mdbooth?
This issue is now important to my current work, I'm planning to work on this next week and will propose a PR implementing the new CRD for OpenStackServerGroup
and controller as @mkjpryor has outlined - I believe it covers the cases discussed in this issue so far.
@EmilienM please let me know if you have progressed this, otherwise I'm happy to pick this up now.
Quick update: I plan to upload a PR for feedback next week.
I have a working version of the OpenStackServerGroup
CRD and controller. The OpenStackMachineSpec
adds serverGroupRef
into v1alpha8
which resolves once the server group is created. I'm now working on ensuring tests and edge cases are reasonable.
I'm honestly not sure we want a new CRD & controller for this as it might be too complex for our needs but I'm happy to be proven wrong. I'll wait to see what @dalees proposed.
I've uploaded a first pass at this, there are a few TODO's listed but it is functional.
On the topic of it being a CRD & controller, I can see this is fairly heavyweight for a simple feature, but it does provide a solution to all use cases in https://github.com/kubernetes-sigs/cluster-api-provider-openstack/issues/1256#issuecomment-1823199045 and deals with the lifecycle cleanly. That's the main benefit of this.
After we discussed earlier in this thread, I noticed ServerGroupID
has been changed into ServerGroup
(a filter that allows name or uuid). One option is to extend this further to include the expected policy, and create if it the named group doesn't already exist. The deletion criteria become less obvious if used for worker nodes, however.
Can you think of other options? How do we decide the way forward? I'm happy to join the next CAPO meeting to discuss.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/reopen
/remove-lifecycle rotten
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
/kind feature
Describe the solution you'd like We should put control plane nodes in a server group with the soft anti-affinity policy to decrease the risk that a hypervisor failure takes out the whole control plane.
This becomes more important once #1252 is merged, but is something that I think we should do even when using explicit AZs.
Anything else you would like to add: