googleforgames / agones

Dedicated Game Server Hosting and Scaling for Multiplayer Games on Kubernetes
https://agones.dev
Apache License 2.0
6.01k stars 796 forks source link

Proposal: Split up the api group stable.agones.dev #703

Closed markmandel closed 5 years ago

markmandel commented 5 years ago

This has been rolling around in my head for a while, so wanted to throw it out there.

Basically watching the changes coming to allocation (#682), and how that will have to be in a different group, and then the multicluster work (#698) feels like it should be separated in someway from everything else - I'm wondering if having stable.agones.dev as the core group was a less than ideal decision.

If we made this change, I a sacrificial draft of grouping could be:

core.agones.dev:

scaling.agones.dev

allocation.agones.dev

multicluster.agones.dev

(FleetAllocation would probably have to stay in core, until it goes away, just because you can't mix CRDs and APIServices)

Looking at Kubernetes - they also do something similar.

This is obviously a breaking change - so wanted to run it past everyone, see how people felt.

markmandel commented 5 years ago

/cc @jkowalski @pooneh-m @roberthbailey @Kuqd

roberthbailey commented 5 years ago

There is a great comment here from Clayton about what an apigroup should encompass (I'll inline the important bits so people don't get side tracked by that issue):

historically, a Kube API group is intended to roughly encompass a lifecycle. That lifecycle has sometimes been SIG defined (autoscaling) but has also been related to orthogonal features (batch and v2beta1 being created for cronjob specifically, or the split between the various parts of what sig-auth controls). When boundaries can exist between lifecycles, we strongly recommend those be created. Federation v1 is a cautionary tale around coupling too closely between "high level API" (the workloads) and the low level api (the workload APIs on each cluster) - we erred a bit too far on putting everything all together, while federation v2 has focused on a much lower level primitive succeeding.

I think that the lifecycles for the pieces you listed above will likely be separable. The "core" bits will probably stabilize and not change much as we iterate on allocations and especially multicluster support.

Is there historical context for why stable.agones.dev was initially chosen vs just agones.dev? Was the idea to have both stable and unstable api groups? Why different groups instead of just different api versions within the same group?

Note that you can also have apigroups that have names that look nested (because they are different strings). For example, if you run kubectl api-resources -o wide against an installed cluster you will see that there is a group authorization.k8s.io with resources like subjectaccessreviews but that there is also a group rbac.authorization.k8s.io with role and rolebindings defined. The naming structure expresses the relationship - rbac is part of authorization.

Which makes me wonder why use the core prefix at all? What if the core resource types were in an agones.dev apigroup and the other types are put into "subgroups"?

In core k8s, horizontal pod autoscaling is in an apigroup is called autoscaling (which would probably be autoscaling.k8s.io if it was named today) so I think it makes sense to put FleetAutoscaling into autoscaling.agones.dev so that the names align.

markmandel commented 5 years ago

Thanks for the detailed comments @roberthbailey ! That is useful knowledge!

Is there historical context for why stable.agones.dev was initially chosen vs just agones.dev? Was the idea to have both stable and unstable api groups? Why different groups instead of just different api versions within the same group?

Honestly, I saw that the sample controller had a 3 level group, samplecontroller.k8s.io and I kind of arbitrarily chose stable.agones.dev since I needed to move forward, and there wasn't much literature out there at the time in regards to choosing names :man_shrugging:

Now that we have more experience - i figured it may be time to revisit.

Regarding all your points - I wasn't even aware you could have a two level group, such as agones.dev - that sounds super nice. Also re: scaling vs autoscaling - I couldn't find a reference, so picked a word. I like autoscaling better too :+1: especially if it's inline with k8s.

So to revisit then:

agones.dev:

autoscaling.agones.dev

allocation.agones.dev

multicluster.agones.dev

That reads really nicely to me :+1: what does everyone else think?

roberthbailey commented 5 years ago

Given the parenthetical statement about FleetAllocations, would it make sense to put them into the allocation.agones.dev group instead (so that they can be deprecated there leaving the agones.dev group more stable)?

markmandel commented 5 years ago

Ideally - would totally love to. However, given technical constraints, allocation.agones.dev is controlled by an APIService (so we don't have to store the CRD results, and give us more control over the throughput), whereas FleetAllocations are still controlled by CRDs -- which are two approaches which can't be mixed.

We could move FleetAllocations to be no-storage APIService as well, so that they are in the same group but since they will be going away, ,it doesn't seem worth the effort to me?

roberthbailey commented 5 years ago

If we update the api group name, we should also update other places that use the same string, such as the prefix used in the SDK for labels and annotations: https://agones.dev/site/docs/guides/client-sdks/#setlabel-key-value

markmandel commented 5 years ago

Sounds like we have pretty decent consensus on this. I'm going to add this to the next milestone (and essentially the 1.0 roadmap -- as I think we need to complete any CRD changes before we go 1.0 and declare stability.

Please let me know if anyone has objections!

markmandel commented 5 years ago

Update on this - As you can see above, I did FleetAutoscaler, which was the easiest.

There is a PR #856 , to remove FleetAllocation, which will make this easier.

I'm currently focused on #660 - so I'm not working on this at the moment, so if someone wants to jump in, please feel free. Happy to provide helpers for how to make these changes as well. (make gen-crd-client is probably the biggest thing to look at)

The only thing left is moving stable.agones.dev => agones.dev

roberthbailey commented 5 years ago

Another thing we discussed yesterday was promoting the APIs that we are going to commit to keeping stable from v1alpha1 to v1 prior to cutting a 1.0 release. Given that moving stable.agones.dev -> agones.dev will be easier once #856 merges I can start looking at changing the version of the ones that have already been split out first.

roberthbailey commented 5 years ago

I also noticed that https://github.com/googleforgames/agones/blob/master/site/static/agones.cast will need to be updated when the group name changes since the output shows fleet.stable.agones.dev.

roberthbailey commented 5 years ago

Remaining work: go through all of the examples and see if they need to have new container images created with the sdk changes.

markmandel commented 5 years ago

Is there anything left on this? I think this is done and can be closed now?

roberthbailey commented 5 years ago

See the above comment - I'm going through the examples to make sure that they work and updating container images in agones-images where necessary.

roberthbailey commented 5 years ago

PRs for updating examples (some of which have already merged):

Outstanding:

Merged:

I'll close out this issue once the remaining two PRs are merged.