Closed markmandel closed 5 years ago
Sounds like a great idea. This potentially benefits #297
I am totally in favor of the changes for Allocation status. I do have some questions on the allocation cross fleet though. As you said, most people will do 1:1 and it is adding complexity. I however agree that this is more the kubernetes way of doing.
I can totally see cross fleet allocation for multiple reasons but when I think about it, my next question is how can I define the order of allocation across those fleets. I feel like the current allocation strategy: packed
and distributed
won't be enough.
I guess I don't understand the need well enough to have a strong opinion. Could you define a concrete use case?
On the other hand, if someone is not using Fleet because they manage GameServers directly(which is a totally valid use case) I see your point. Maybe we can keep both and implement FleetAllocation
in terms of GameServerAllocation
by using a match expression with the Fleet name.
Easy cross-cluster allocation with this issue?
I am totally in favor of the changes for Allocation status. I do have some questions on the allocation cross fleet though. As you said, most people will do 1:1 and it is adding complexity. I however agree that this is more the kubernetes way of doing.
:+1:
I can totally see cross fleet allocation for multiple reasons but when I think about it, my next question is how can I define the order of allocation across those fleets. I feel like the current allocation strategy: packed and distributed won't be enough.
I don't disagree with this. This sets us up to have the flexibility to do smoke tests between fleets as well. We could expand on this configuration as we decide how we want to do this.
I guess I don't understand the need well enough to have a strong opinion. Could you define a concrete use case?
Initial thought is for doing smoke testing across two fleets (so have a smaller fleet of the test server version). There are some issues around Stategy
on this -- My thought is that in the next phase we do a preferred
selector (much like pod affinity), so that we can have a preferred allocation to say a small smoke fleet, and then if they are full, allocate to the larger stable group (or maybe this is something we build into this design - I'm open to that too).
This also allows for more direct control over rollouts of new Fleets if you don't like Rolling Update for whatever reason. You can allocate across a red and green fleet, and manually scale one up and down as you feel comfortable with the new versions coming in - without having to change your allocation code.
The only other use case I have, is like #297 as requested above - maybe you want to allocate on a smaller set of a Fleet (or even a specific one, give a unique label to each GameServer
) - this also gives users that flexibility, and more.
On the other hand, if someone is not using Fleet because they manage GameServers directly(which is a totally valid use case) I see your point. Maybe we can keep both and implement FleetAllocation in terms of GameServerAllocation by using a match expression with the Fleet name.
This is my exact example above - you can still do the most probably allocation by using a match expression - and my plan was to have our Quick Start docs do exactly that.
I was thinking about the preferred idea - if we go down that path, we should probably change things a little, WDYT?
apiVersion: "stable.agones.dev/v1alpha1"
kind: Allocation
metadata:
# We recommend using the following to generate a unique name when creating Allocations
# This will need to be created with `kubectl create` if using the command line tooling
generateName: fleet-allocation-example-
spec:
# GameServer selector from which to choose GameServers from.
# GameServers still have the hard requirement to be `Ready` to be allocated from
# however we can also make available `matchExpressions` for even greater
# flexibility.
# Below is an example of a GameServer allocated against a given fleet.
required:
matchLabels:
stable.agones.dev/fleet: fleet-example
# preferred allocation, not required
preferred:
matchLabels:
stable.agones.dev/fleet: green-fleet
# Allocation strategy. Packed or Distributed.
# Since we aren't tied to a Fleet, we have to declare this here.
# (Packed is default as per usual)
strategy: Packed
# Optional custom metadata that is added to the game server at allocation
# You can use this to tell the server necessary session data
metadata:
labels:
mode: deathmatch
annotations:
map: garden22
Is Allocation too generic a name? Should this be something like GameServerAllocation or is that redundant?
I think GameServerAllocation is potentially better, how well does kubernetes work if there is a conflict in name? Even though there is some namespacing of these based on the 'apiVersion' you don't specify this for all APIs (e.g. 'kubectl get')
Initial thought is for doing smoke testing across two fleets (so have a smaller fleet of the test server version). There are some issues around Stategy on this -- My thought is that in the next phase we do a preferred selector (much like pod affinity), so that we can have a preferred allocation to say a small smoke fleet, and then if they are full, allocate to the larger stable group (or maybe this is something we build into this design - I'm open to that too).
With some form of ordering, I can see it now. Should we make the design open to more than a single preferred though? Also, I am assuming that, like the pod affinity when we have ...IgnoredDuringExecution
, the allocation will skip the preferred one if there is no capacity. Am I right?
I would go with GameServerAllocation
too.
Sounds like GameServerAllocation
works best. I agree - let's do that, I'll update the design above. I don't think collisions are well handled, and GameServerAllocation
does make sense!
Should we make the design open to more than a single preferred though?
Interesting question. I have some mild concerns around complexity and performance here - but they may be unfounded.
Were you thinking something like a list of ordered preferred
selectors (which could be match selectors or set based expressions) - in order of priority?
Something like:
apiVersion: "stable.agones.dev/v1alpha1"
kind: GameServerAllocation
metadata:
# We recommend using the following to generate a unique name when creating Allocations
# This will need to be created with `kubectl create` if using the command line tooling
generateName: fleet-allocation-example-
spec:
# GameServer selector from which to choose GameServers from.
# GameServers still have the hard requirement to be `Ready` to be allocated from
# however we can also make available `matchExpressions` for even greater
# flexibility.
# Below is an example of a GameServer allocated against a given fleet.
required:
matchLabels:
stable.agones.dev/fleet: fleet-example
# preferred allocation, not required
preferred:
- matchLabels:
stable.agones.dev/fleet: green-fleet
- matchLabels:
stable.agones.dev/fleet: blue-fleet
# Allocation strategy. Packed or Distributed.
# Since we aren't tied to a Fleet, we have to declare this here.
# (Packed is default as per usual)
strategy: Packed
# Optional custom metadata that is added to the game server at allocation
# You can use this to tell the server necessary session data
metadata:
labels:
mode: deathmatch
annotations:
map: garden22
I love the flexibility, but am wondering what your use case here would be?
Also, I am assuming that, like the pod affinity when we have ...IgnoredDuringExecution, the allocation will skip the preferred one if there is no capacity. Am I right?
Yes, I believe you are. First we would search for a preferred
match, and if that can be found, we use that, there isn't capacity, we go back to searching the required
selector and allocate from there.
Were you thinking something like a list of ordered preferred selectors (which could be match selectors or set based expressions) - in order of priority?
Yes
I love the flexibility, but am wondering what your use case here would be?
It is kind of a combination of yours and one I alluded earlier. Let's say I shipped a new version of my server but for backward compatibility I need to keep the old version for some time. So I have my allocation go to the new one and use the old one as a backup if there is no capacity in the new one. Then I want to do some A/B testing on the new version. So I can in theory allocate on 3 versions.
I am playing devil's advocate here. I just thought that if we opened the door, someone would ask for more and then we would have to say no or break the API again.
I am playing devil's advocate here.
That's a fair advocate to play :smile:
That all sounds pretty reasonable. I'd say, let's run with it. Do we like the above configuration as a list? i couldn't think of anything better.
Do we like the above configuration as a list? i couldn't think of anything better.
I do like it. Nothing to change on my side.
Updated design to match decisions made here in discussion.
One thing I did note that I'm not sure we covered - I believe it should be that the required
selector is totally required.
What I mean by that is that for a GameServer
to be matched against one of the preferred
selectors, it needs to match both the required
selector, as well as the preferred
selector.
I think this is less confusing - as required
means that it always required, rather than have them work separately.
(Did I say "required" enough times? :man_shrugging: :smile:
Have had a couple of conversations about
FleetAllocation
being tied toFleets
specifically, and how it would be more flexible to decouple this.Thinking back, I'm feeling that the original design decision to tie these two things together was less than ideal - so wanted to propose the below solution to rectify this, and a couple of other things (
FleetAllocation.Status
primarily) I would like to see improved as well.Also, given that people are actively using the platform - we are rapidly approaching what I think will be a 1.0 (separate conversation 😊) - if we want to make big changes to solve foundational issues, we may want to make them now, before we settle on what becomes a "1.0", and things become harder to change (but check my backward compatibility section below on this for some more details on this).
Proposed Specification
The flow of allocation should not change (basically everything works that same as
FleetAllocation
does now), but I propose that game server allocations now occur using the following CRD instead:The GameServerAllocation will then use both
required
selector to make a list ofReady
GameServers
that match the selector, and allocate based on that list (instead of listing theGameServers
for a given specific Fleet).GameServers
that match thepreferred
selectors will be allocated first out of therequired
set. The list is a preferred priority list - i.e. an attempt will be made to find aGameServer
that matches the first selector, and if that cannot be matched, and attempt will be made for the second, and so forth. If aGameServer
cannot be found, allocation will occur from the widerrequired
set.This decouples
Allocations
fromFleets
- giving users (and ourselves in the future) the ability to create their ownGameServer
management strategies -- while still being able to take advantage ofAllocations
-- while also maintaining the ability to do a more simpleGameServerAllocation
from aFleet
with the selector. This also puts us closer inline with Kubernetes strategies as a whole as well.Having selectors makes
GameServerAllocation
far more flexible than its current incarnation. Also, this would also provide the ability to allocate on matchexpressions (set based expressions) - which is even more flexible.For example, this makes it possible for a user to allocate across a subset of a
Fleet
(possibly in combination with SDK Label setting), or allocating across multiple Fleets inside a cluster for smoke tests with preferred selectors - depending on needs - or scaling of Fleets.Downside here being that there is more complexity here - because most people will likely 1:1 allocate from a Fleet - but I think with appropriate documentation, this shouldn't be a large issue.
GameServerAllocation Status
In retrospect, I don't think it was a good idea to include the entire
GameServer
as part of the FleetAllocation state. It is far too much information (which also required extra processing to generate the json patch), and could be potentially confusing as it may be expected for thatGameServer
information to stay up to date with theFleetAllocation
.Also, it is currently difficult to determine if an GameServerAllocation has failed because of error, or because there weren't any GameServers available. To that end, we should include this information in the
Allocation.Status
value asState
.For example, this is the response you get when a
FleetAllocation
fails due to lack ofGameServer
inventory via curl:You would need to hunt for the string "NotFound" in
message
values - it's not inherently clear.Instead, I would recommend the following
Status
values:This was a value is always returned, and an easily understood and parsed value of
State
will define if an allocation occurred successfully or not.When an
GameServerAllocation
is returned in anUnAllocated
state, it should be marked for deletion / deleted in a timely manner after return, to ensure they don't end up remaining forever.In this incarnation, if a HTTP 500 error occurs - this is a bug / server issue, and is correctly identified as such.
If a user needs more information about the allocated
GameServer
- the GameServer name is available, and details can be retrieved through the Kubernetes API.Implementation
The good thing here, is that the implementation of this is not drastically different from the current
FleetAllocation
implementation (in fact, I believe that 99% of the code and tests can be copy pasted across into anGameServerAllocation
controller -- and probably should be separate because of backward compatibility. See below).To provide the described functionality, the only major change would be to change this line, from fleetallocation/controller.go's
allocate()
function fromgsList, err := fleets.ListGameServersByFleetOwner(c.gameServerLister, f)
to instead search forGameServers
via the providedrequired
selector, and filter appropriately via thepreferred
selector list. Almost all other code should be able to be reused.Backward Compatibility
While we are alpha, we do have games that are in/close to production - so it's good to have a good deprecation policy here. Also FleetAllocation is such a core part of the system, changing this shouldn't be done lightly.
We can deprecate FleetAllocations, and keep the code, and everything there can maintain its functionality. I would suggest a removal of FleetAllocation in +2 releases from when the GameServerAllocation CRD is released, to give everyone 12 weeks to test and migrate to the new GameServerAllocation CRD.
Since the mutex for allocation is already shared amongst controllers in the system - there is also no reason users can't use both GameServerAllocation and FleetAllocation at the same time as they migrate.
As always - thoughts, feelings, questions and all feedback is appreciated.