Allocation on GameServers rather than Fleets

markmandel commented 5 years ago

Have had a couple of conversations about FleetAllocation being tied to Fleets specifically, and how it would be more flexible to decouple this.

Thinking back, I'm feeling that the original design decision to tie these two things together was less than ideal - so wanted to propose the below solution to rectify this, and a couple of other things (FleetAllocation.Status primarily) I would like to see improved as well.

Also, given that people are actively using the platform - we are rapidly approaching what I think will be a 1.0 (separate conversation 😊) - if we want to make big changes to solve foundational issues, we may want to make them now, before we settle on what becomes a "1.0", and things become harder to change (but check my backward compatibility section below on this for some more details on this).

Proposed Specification

The flow of allocation should not change (basically everything works that same as FleetAllocation does now), but I propose that game server allocations now occur using the following CRD instead:

apiVersion: "stable.agones.dev/v1alpha1"
kind: GameServerAllocation
metadata:
  # We recommend using the following to generate a unique name when creating Allocations
  # This will need to be created with `kubectl create` if using the command line tooling
  generateName: fleet-allocation-example-
spec:
  # GameServer selector from which to choose GameServers from.
  # GameServers still have the hard requirement to be `Ready` to be allocated from
  # however we can also make available `matchExpressions` for even greater
  # flexibility.
  # Below is an example of a GameServer allocated against a given fleet.
  required:
    matchLabels:
      stable.agones.dev/fleet: fleet-example
  # ordered list of preferred allocations out of the `required` set.
  # If the first selector is not matched,
  # the selection attempts the second selector, and so on.
  # This also support `matchExpressions`
  preferred:
    - matchLabels:
        stable.agones.dev/fleet: green-fleet
    - matchLabels:
        stable.agones.dev/fleet: blue-fleet
  # Allocation strategy. Packed or Distributed.
  # Since we aren't tied to a Fleet, we have to declare this here.
  # (Packed is default as per usual)
  strategy: Packed
  # Optional custom metadata that is added to the game server at allocation
  # You can use this to tell the server necessary session data
  metadata:
    labels:
      mode: deathmatch
    annotations:
      map:  garden22

The GameServerAllocation will then use both required selector to make a list of Ready GameServers that match the selector, and allocate based on that list (instead of listing the GameServers for a given specific Fleet).

GameServers that match the preferred selectors will be allocated first out of the required set. The list is a preferred priority list - i.e. an attempt will be made to find a GameServer that matches the first selector, and if that cannot be matched, and attempt will be made for the second, and so forth. If a GameServer cannot be found, allocation will occur from the wider required set.

This decouples Allocations from Fleets - giving users (and ourselves in the future) the ability to create their own GameServer management strategies -- while still being able to take advantage of Allocations -- while also maintaining the ability to do a more simple GameServerAllocation from a Fleet with the selector. This also puts us closer inline with Kubernetes strategies as a whole as well.

Having selectors makes GameServerAllocation far more flexible than its current incarnation. Also, this would also provide the ability to allocate on matchexpressions (set based expressions) - which is even more flexible.

For example, this makes it possible for a user to allocate across a subset of a Fleet (possibly in combination with SDK Label setting), or allocating across multiple Fleets inside a cluster for smoke tests with preferred selectors - depending on needs - or scaling of Fleets.

Downside here being that there is more complexity here - because most people will likely 1:1 allocate from a Fleet - but I think with appropriate documentation, this shouldn't be a large issue.

GameServerAllocation Status

In retrospect, I don't think it was a good idea to include the entire GameServer as part of the FleetAllocation state. It is far too much information (which also required extra processing to generate the json patch), and could be potentially confusing as it may be expected for that GameServer information to stay up to date with the FleetAllocation.

Also, it is currently difficult to determine if an GameServerAllocation has failed because of error, or because there weren't any GameServers available. To that end, we should include this information in the Allocation.Status value as State.

For example, this is the response you get when a FleetAllocation fails due to lack of GameServer inventory via curl:

< HTTP/1.1 500 Internal Server Error
< Content-Length: 415
< Content-Type: application/json
< Date: Sun, 02 Dec 2018 04:36:40 GMT
<
{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {

  },
  "status": "Failure",
  "message": "Internal error occurred: admission webhook \"mutations.stable.agones.dev\" denied the request: NotFound",
  "reason": "InternalError",
  "details": {
    "causes": [
      {
        "message": "admission webhook \"mutations.stable.agones.dev\" denied the request: NotFound"
      }
    ]
  },
  "code": 500

You would need to hunt for the string "NotFound" in message values - it's not inherently clear.

Instead, I would recommend the following Status values:

const (
        // Allocation successful
        Allocated State = "Allocated"
        // Allocation unsuccessful
        UnAllocated State = "UnAllocated"
)

type State string

// GameServerAllocationStatus is the status for an GameServerAllocation resource
type GameServerAllocationStatuss struct {
        // State is the current state of an GameServerAllocation, e.g. Allocated, or UnAllocated
        State          State                           `json:"state"`
        GameServerName string                          `json:"gameServerName"`
        Ports          []v1alpha1.GameServerStatusPort `json:"ports,omitempty"`
        Address        string                          `json:"address,omitempty"`
        NodeName       string                          `json:"nodeName,omitempty"`
}

This was a value is always returned, and an easily understood and parsed value of State will define if an allocation occurred successfully or not.

When an GameServerAllocation is returned in an UnAllocated state, it should be marked for deletion / deleted in a timely manner after return, to ensure they don't end up remaining forever.

In this incarnation, if a HTTP 500 error occurs - this is a bug / server issue, and is correctly identified as such.

If a user needs more information about the allocated GameServer - the GameServer name is available, and details can be retrieved through the Kubernetes API.

Implementation

The good thing here, is that the implementation of this is not drastically different from the current FleetAllocation implementation (in fact, I believe that 99% of the code and tests can be copy pasted across into an GameServerAllocation controller -- and probably should be separate because of backward compatibility. See below).

To provide the described functionality, the only major change would be to change this line, from fleetallocation/controller.go's allocate() function from gsList, err := fleets.ListGameServersByFleetOwner(c.gameServerLister, f) to instead search for GameServers via the provided required selector, and filter appropriately via the preferred selector list. Almost all other code should be able to be reused.

Backward Compatibility

While we are alpha, we do have games that are in/close to production - so it's good to have a good deprecation policy here. Also FleetAllocation is such a core part of the system, changing this shouldn't be done lightly.

We can deprecate FleetAllocations, and keep the code, and everything there can maintain its functionality. I would suggest a removal of FleetAllocation in +2 releases from when the GameServerAllocation CRD is released, to give everyone 12 weeks to test and migrate to the new GameServerAllocation CRD.

Since the mutex for allocation is already shared amongst controllers in the system - there is also no reason users can't use both GameServerAllocation and FleetAllocation at the same time as they migrate.

As always - thoughts, feelings, questions and all feedback is appreciated.

reductor commented 5 years ago

Sounds like a great idea. This potentially benefits #297

EricFortin commented 5 years ago

I am totally in favor of the changes for Allocation status. I do have some questions on the allocation cross fleet though. As you said, most people will do 1:1 and it is adding complexity. I however agree that this is more the kubernetes way of doing.

I can totally see cross fleet allocation for multiple reasons but when I think about it, my next question is how can I define the order of allocation across those fleets. I feel like the current allocation strategy: packed and distributed won't be enough.

I guess I don't understand the need well enough to have a strong opinion. Could you define a concrete use case?

On the other hand, if someone is not using Fleet because they manage GameServers directly(which is a totally valid use case) I see your point. Maybe we can keep both and implement FleetAllocation in terms of GameServerAllocation by using a match expression with the Fleet name.

Oleksii-Terekhov commented 5 years ago

Easy cross-cluster allocation with this issue?

markmandel commented 5 years ago

I am totally in favor of the changes for Allocation status. I do have some questions on the allocation cross fleet though. As you said, most people will do 1:1 and it is adding complexity. I however agree that this is more the kubernetes way of doing.

:+1:

I can totally see cross fleet allocation for multiple reasons but when I think about it, my next question is how can I define the order of allocation across those fleets. I feel like the current allocation strategy: packed and distributed won't be enough.

I don't disagree with this. This sets us up to have the flexibility to do smoke tests between fleets as well. We could expand on this configuration as we decide how we want to do this.

I guess I don't understand the need well enough to have a strong opinion. Could you define a concrete use case?

Initial thought is for doing smoke testing across two fleets (so have a smaller fleet of the test server version). There are some issues around Stategy on this -- My thought is that in the next phase we do a preferred selector (much like pod affinity), so that we can have a preferred allocation to say a small smoke fleet, and then if they are full, allocate to the larger stable group (or maybe this is something we build into this design - I'm open to that too).

This also allows for more direct control over rollouts of new Fleets if you don't like Rolling Update for whatever reason. You can allocate across a red and green fleet, and manually scale one up and down as you feel comfortable with the new versions coming in - without having to change your allocation code.

The only other use case I have, is like #297 as requested above - maybe you want to allocate on a smaller set of a Fleet (or even a specific one, give a unique label to each GameServer) - this also gives users that flexibility, and more.

On the other hand, if someone is not using Fleet because they manage GameServers directly(which is a totally valid use case) I see your point. Maybe we can keep both and implement FleetAllocation in terms of GameServerAllocation by using a match expression with the Fleet name.

This is my exact example above - you can still do the most probably allocation by using a match expression - and my plan was to have our Quick Start docs do exactly that.

markmandel commented 5 years ago

I was thinking about the preferred idea - if we go down that path, we should probably change things a little, WDYT?

apiVersion: "stable.agones.dev/v1alpha1"
kind: Allocation
metadata:
  # We recommend using the following to generate a unique name when creating Allocations
  # This will need to be created with `kubectl create` if using the command line tooling
  generateName: fleet-allocation-example-
spec:
  # GameServer selector from which to choose GameServers from.
  # GameServers still have the hard requirement to be `Ready` to be allocated from
  # however we can also make available `matchExpressions` for even greater
  # flexibility.
  # Below is an example of a GameServer allocated against a given fleet.
  required:
    matchLabels:
      stable.agones.dev/fleet: fleet-example
  # preferred allocation, not required
  preferred:
    matchLabels:
      stable.agones.dev/fleet: green-fleet
  # Allocation strategy. Packed or Distributed.
  # Since we aren't tied to a Fleet, we have to declare this here.
  # (Packed is default as per usual)
  strategy: Packed
  # Optional custom metadata that is added to the game server at allocation
  # You can use this to tell the server necessary session data
  metadata:
    labels:
      mode: deathmatch
    annotations:
      map:  garden22

reductor commented 5 years ago

Is Allocation too generic a name? Should this be something like GameServerAllocation or is that redundant?

I think GameServerAllocation is potentially better, how well does kubernetes work if there is a conflict in name? Even though there is some namespacing of these based on the 'apiVersion' you don't specify this for all APIs (e.g. 'kubectl get')

EricFortin commented 5 years ago

Initial thought is for doing smoke testing across two fleets (so have a smaller fleet of the test server version). There are some issues around Stategy on this -- My thought is that in the next phase we do a preferred selector (much like pod affinity), so that we can have a preferred allocation to say a small smoke fleet, and then if they are full, allocate to the larger stable group (or maybe this is something we build into this design - I'm open to that too).

With some form of ordering, I can see it now. Should we make the design open to more than a single preferred though? Also, I am assuming that, like the pod affinity when we have ...IgnoredDuringExecution, the allocation will skip the preferred one if there is no capacity. Am I right?

I would go with GameServerAllocation too.

markmandel commented 5 years ago

Sounds like GameServerAllocation works best. I agree - let's do that, I'll update the design above. I don't think collisions are well handled, and GameServerAllocation does make sense!

Should we make the design open to more than a single preferred though?

Interesting question. I have some mild concerns around complexity and performance here - but they may be unfounded.

Were you thinking something like a list of ordered preferred selectors (which could be match selectors or set based expressions) - in order of priority?

Something like:

apiVersion: "stable.agones.dev/v1alpha1"
kind: GameServerAllocation
metadata:
  # We recommend using the following to generate a unique name when creating Allocations
  # This will need to be created with `kubectl create` if using the command line tooling
  generateName: fleet-allocation-example-
spec:
  # GameServer selector from which to choose GameServers from.
  # GameServers still have the hard requirement to be `Ready` to be allocated from
  # however we can also make available `matchExpressions` for even greater
  # flexibility.
  # Below is an example of a GameServer allocated against a given fleet.
  required:
    matchLabels:
      stable.agones.dev/fleet: fleet-example
  # preferred allocation, not required
  preferred:
    - matchLabels:
        stable.agones.dev/fleet: green-fleet
    - matchLabels:
        stable.agones.dev/fleet: blue-fleet
  # Allocation strategy. Packed or Distributed.
  # Since we aren't tied to a Fleet, we have to declare this here.
  # (Packed is default as per usual)
  strategy: Packed
  # Optional custom metadata that is added to the game server at allocation
  # You can use this to tell the server necessary session data
  metadata:
    labels:
      mode: deathmatch
    annotations:
      map:  garden22

I love the flexibility, but am wondering what your use case here would be?

Also, I am assuming that, like the pod affinity when we have ...IgnoredDuringExecution, the allocation will skip the preferred one if there is no capacity. Am I right?

Yes, I believe you are. First we would search for a preferred match, and if that can be found, we use that, there isn't capacity, we go back to searching the required selector and allocate from there.

EricFortin commented 5 years ago

Were you thinking something like a list of ordered preferred selectors (which could be match selectors or set based expressions) - in order of priority?

Yes

I love the flexibility, but am wondering what your use case here would be?

It is kind of a combination of yours and one I alluded earlier. Let's say I shipped a new version of my server but for backward compatibility I need to keep the old version for some time. So I have my allocation go to the new one and use the old one as a backup if there is no capacity in the new one. Then I want to do some A/B testing on the new version. So I can in theory allocate on 3 versions.

I am playing devil's advocate here. I just thought that if we opened the door, someone would ask for more and then we would have to say no or break the API again.

markmandel commented 5 years ago

I am playing devil's advocate here.

That's a fair advocate to play :smile:

That all sounds pretty reasonable. I'd say, let's run with it. Do we like the above configuration as a list? i couldn't think of anything better.

EricFortin commented 5 years ago

Do we like the above configuration as a list? i couldn't think of anything better.

I do like it. Nothing to change on my side.

markmandel commented 5 years ago

Updated design to match decisions made here in discussion.

One thing I did note that I'm not sure we covered - I believe it should be that the required selector is totally required.

What I mean by that is that for a GameServer to be matched against one of the preferred selectors, it needs to match both the required selector, as well as the preferred selector.

I think this is less confusing - as required means that it always required, rather than have them work separately.

(Did I say "required" enough times? :man_shrugging: :smile:

googleforgames / agones