googleforgames / agones

Dedicated Game Server Hosting and Scaling for Multiplayer Games on Kubernetes
https://agones.dev
Apache License 2.0
6.1k stars 813 forks source link

Fleet autoscaler with "List" policy throws an error if configured with a fleet with no replicas #3943

Closed geopaul-nm closed 1 week ago

geopaul-nm commented 3 months ago

What happened:

If we setup a fleet without defining "replicas" and setup an autoscaler with "List" policy, the autoscaler throws the following error.

Error calculating desired fleet size on FleetAutoscaler simple-game-server-autoscaler. Error: cannot apply ListPolicy as List key MyCustomList does not exist in the Fleet Status

What you expected to happen:

The fleet autoscaler should be able to determine the desired number of replicas even if there is no replicas present.

How to reproduce it (as minimally and precisely as possible):

Setup a fleet with no replicas defined, and a list based autoscaler for the fleet. Ex.

---
apiVersion: agones.dev/v1
kind: Fleet
metadata:
  name: simple-game-server
spec:
  template:
    spec:
      lists:
        MyCustomList:
          capacity: 50
      ports:
        - name: default
          containerPort: 7654
      template:
        spec:
          containers:
            - name: simple-game-server
              image: us-docker.pkg.dev/agones-images/examples/simple-game-server:0.34
              resources:
                requests:
                  memory: 64Mi
                  cpu: 20m
                limits:
                  memory: 64Mi
                  cpu: 20m
---
apiVersion: autoscaling.agones.dev/v1
kind: FleetAutoscaler
metadata:
  name: simple-game-server-autoscaler
spec:
  fleetName: simple-game-server
  policy:
    type: List  # List based autoscaling.
    list:
      key: MyCustomList
      bufferSize: 10%
      minCapacity: 50
      maxCapacity: 800

Below is the describe output of the fleet autoscaler

Name:         simple-game-server-autoscaler
Namespace:    default
Labels:       <none>
Annotations:  <none>
API Version:  autoscaling.agones.dev/v1
Kind:         FleetAutoscaler
Metadata:
  Creation Timestamp:  2024-08-12T14:35:58Z
  Generation:          1
  Resource Version:    28009
  UID:                 91ff6589-e3c4-4456-b5b4-e31767a8ac03
Spec:
  Fleet Name:  simple-game-server
  Policy:
    List:
      Buffer Size:   10%
      Key:           MyCustomList
      Max Capacity:  800
      Min Capacity:  50
    Type:            List
  Sync:
    Fixed Interval:
      Seconds:  30
    Type:       FixedInterval
Events:
  Type     Reason           Age                 From                        Message
  ----     ------           ----                ----                        -------
  Warning  FleetAutoscaler  12s (x25 over 76s)  fleetautoscaler-controller  Error calculating desired fleet size on FleetAutoscaler simple-game-server-autoscaler. Error: cannot apply ListPolicy as List key MyCustomList does not exist in the Fleet Status

Anything else we need to know?: This is crucial for agones to be used with "Flux CD" (https://fluxcd.io/) and probably other gitops based tools.

Environment:

markmandel commented 2 months ago

@igooch this seems like it's in your wheelhouse?

I'm wondering if there isn't a replica count, there are no count and/or list values set in the status values?

igooch commented 2 months ago

@igooch this seems like it's in your wheelhouse?

I'm wondering if there isn't a replica count, there are no count and/or list values set in the status values?

Oh yep, we when implementing this we made the assumption that the Fleet can't scale to 0 replicas, so it also can't scale from 0 replicas.

Currently the list / counter fleet status goes from game server status -> game server set aggregation -> fleet status aggregation. So, no game servers, no aggregation. We could possibly add in a line to create the status from the somewhere in the replica set controller to create empty lists / counters with 0 capacity if there are no game servers on that game server set:

https://github.com/googleforgames/agones/blob/a348312bbe5d3dfb134a4c7b1fb856267c6858c2/pkg/gameserversets/controller.go#L645-L649

igooch commented 2 months ago

@ashutosji can you take a look and see if this would just require a change to the game server set controller, or if it would also require a change to the autoscaler as well?

kamaljeeti commented 1 month ago

Hi @igooch , I think list of GameServer will be empty in the case of Fleet can't scale to 0 replicas. We can have check if list of GameServer is empty and FeatureCountsAndLists feature is enabled in that case we can initialize empty list or counter with zero capacity. Something like this:

    if len(list) == 0 {
        if runtime.FeatureEnabled(runtime.FeatureCountsAndLists) {
            status.Counters = make(map[string]agonesv1.AggregatedCounterStatus)
            status.Lists = make(map[string]agonesv1.AggregatedListStatus)

            status.Lists["MyCustomList"] = agonesv1.AggregatedListStatus{
                Count:    0,
                Capacity: 0,
            }
        }
        return status
    }

How can we get list of keys from Fleet spec? is the above approach seems reasonable. WDYT?

igooch commented 4 weeks ago

@kamaljeeti approach seems reasonable, below are some suggested edits, where gsSpec is type agonesv1.GameServerSpec. So when calling compute status it will take in the computeStatus(list, gsSet.Spec.Template.Spec).

    // If there are Zero game servers on the GameServerSet, created an empty AggregatedListStatus or
    // AggregatedCounterStatus to allow for scaling up from Zero when using a List or Counter FleetAutoscaler.
    if runtime.FeatureEnabled(runtime.FeatureCountsAndLists) && len(list) == 0 {
        status.Lists = make(map[string]agonesv1.AggregatedListStatus)
        for key := range gsSpec.Lists {
            status.Lists[key] = agonesv1.AggregatedListStatus{}
        }
        status.Counters = make(map[string]agonesv1.AggregatedCounterStatus)
        for key := range gsSpec.Counters {
            status.Counters[key] = agonesv1.AggregatedCounterStatus{}
        }
    }

The question is still outstanding on whether or not there are any "gotchyas" with the fleet autoscaler. I believe this will work, but we'll want to thoroughly test it.