If a fleet has replicas specified, the fleet allocator will fail to scale the fleet to more than N replicas specified in the fleet spec.

dmitryrn commented 1 year ago

What happened: If a fleet has replicas specified, the fleet allocator will fail to scale the fleet to more than N replicas specified in the fleet spec.

What you expected to happen: fleet autoscaler ignores fleet's replicas value and scales up using autoscaler's configuration.

How to reproduce it (as minimally and precisely as possible):

---
apiVersion: "agones.dev/v1"
kind: Fleet
metadata:
  name: fleet1
  namespace: dev-server
  labels:
    fleetset: "1"
spec:
  replicas: 2
  scheduling: Distributed
  strategy:
    type: Recreate
  template:
    spec:
      ports:
      - name: default
        containerPort: 7777
        portPolicy: Dynamic
        protocol: UDP
      health:
        disabled: true
      sdkServer:
        logLevel: Debug
      template:
        metadata:
          labels:
            gameserver: gameserver
        spec:
          containers:
          - name: gameserver
            imagePullPolicy: Always
            image: yourimage
---
apiVersion: "autoscaling.agones.dev/v1"
kind: FleetAutoscaler
metadata:
  name: fleet1-autoscaler
  namespace: dev-server  
spec:
  fleetName: fleet1
  policy:
    type: Buffer
    buffer:
      bufferSize: 2
      minReplicas: 2
      maxReplicas: 200

game server source code:

package main

import (
    "time"

    sdk "agones.dev/agones/sdks/go"
)

func main() {
    sdk, err := sdk.NewSDK()
    if err != nil {
        panic(err)
    }

    err = sdk.Ready()
    if err != nil {
        panic(err)
    }

    time.Sleep(time.Hour*999)
}

Allocate a game server with an allocation:

apiVersion: "allocation.agones.dev/v1"
kind: GameServerAllocation
spec:
  selectors:
    - matchLabels:
        agones.dev/fleet: fleet1
    - matchLabels:
        fleetset: "1"
      gameServerState: Ready
  scheduling: Packed

allocate 2 times, and notice that the third game server is shutting down right after it starts over and over.

Anything else we need to know?: The solution is to remove replicas from the fleet spec, but it took some time to figure out why it wasn't working as expected.

Environment:

Agones version: 1.33.0
Kubernetes version (use kubectl version): Client Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.4", GitCommit:"fa3d7990104d7c1f16943a67f11b154b71f6a132", GitTreeState:"archive", BuildDate:"2023-07-20T07:37:53Z", GoVersion:"go1.20.6", Compiler:"gc", Platform:"linux/amd64"}
Cloud provider or hardware configuration: EKS
Install method (yaml/helm): helm
Troubleshooting guide log(s):
Others:

markmandel commented 1 year ago

So I tried replicating this via our simple game server example, and couldn't get it to replicate:

❯ kubectl apply -f ./fleet.yaml
fleet.agones.dev/simple-game-server created

❯ kubectl get gs
NAME                             STATE   ADDRESS         PORT   NODE                                     AGE
simple-game-server-zspq6-fb787   Ready   34.83.221.149   7587   gke-test-cluster-default-3d7d5583-4t9m   40s
simple-game-server-zspq6-l5ktp   Ready   34.83.221.149   7261   gke-test-cluster-default-3d7d5583-4t9m   40s

❯ kubectl apply -f ./fleetautoscaler.yaml
fleetautoscaler.autoscaling.agones.dev/simple-game-server-autoscaler created

❯ kubectl create -f ./gameserverallocation.yaml
gameserverallocation.allocation.agones.dev/simple-game-server-zspq6-fb787 created

❯ kubectl get gs
NAME                             STATE       ADDRESS         PORT   NODE                                     AGE
simple-game-server-zspq6-fb787   Allocated   34.83.221.149   7587   gke-test-cluster-default-3d7d5583-4t9m   57s
simple-game-server-zspq6-l5ktp   Ready       34.83.221.149   7261   gke-test-cluster-default-3d7d5583-4t9m   57s

# waiting 30 seconds

❯ kubectl get gs
NAME                             STATE       ADDRESS         PORT   NODE                                     AGE
simple-game-server-zspq6-4p9hr   Ready       34.83.221.149   7898   gke-test-cluster-default-3d7d5583-4t9m   10s
simple-game-server-zspq6-fb787   Allocated   34.83.221.149   7587   gke-test-cluster-default-3d7d5583-4t9m   74s
simple-game-server-zspq6-l5ktp   Ready       34.83.221.149   7261   gke-test-cluster-default-3d7d5583-4t9m   74s

❯ kubectl describe fleetautoscalers.autoscaling.agones.dev
Name:         simple-game-server-autoscaler
Namespace:    default
Labels:       <none>
Annotations:  <none>
API Version:  autoscaling.agones.dev/v1
Kind:         FleetAutoscaler
Metadata:
  Creation Timestamp:  2023-09-06T22:31:25Z
  Generation:          1
  Resource Version:    38592131
  UID:                 e587728e-32b0-4f5f-a1ae-2a5478f3ca15
Spec:
  Fleet Name:  simple-game-server
  Policy:
    Buffer:
      Buffer Size:   2
      Max Replicas:  10
      Min Replicas:  0
    Type:            Buffer
  Sync:
    Fixed Interval:
      Seconds:  30
    Type:       FixedInterval
Status:
  Able To Scale:     true
  Current Replicas:  3
  Desired Replicas:  3
  Last Scale Time:   2023-09-06T22:31:55Z
  Scaling Limited:   false
Events:
  Type    Reason            Age    From                        Message
  ----    ------            ----   ----                        -------
  Normal  AutoScalingFleet  2m39s  fleetautoscaler-controller  Scaling fleet simple-game-server from 2 to 3

❯ kubectl describe fleet
Name:         simple-game-server
Namespace:    default
Labels:       <none>
Annotations:  agones.dev/sdk-version: 1.34.0
API Version:  agones.dev/v1
Kind:         Fleet
Metadata:
  Creation Timestamp:  2023-09-06T22:30:52Z
  Generation:          2
  Resource Version:    38591820
  UID:                 c3c75b22-121c-4940-ac5d-8a0370c15995
Spec:
  Replicas:    3
  Scheduling:  Packed
  Strategy:
    Rolling Update:
      Max Surge:        25%
      Max Unavailable:  25%
    Type:               RollingUpdate
  Template:
    Metadata:
      Creation Timestamp:  <nil>
    Spec:
      Health:
      Immutable Replicas:  1
      Ports:
        Container Port:  7654
        Name:            default
      Sdk Server:
      Template:
        Metadata:
          Creation Timestamp:  <nil>
        Spec:
          Containers:
            Image:  us-docker.pkg.dev/agones-images/examples/simple-game-server:0.17
            Name:   simple-game-server
            Resources:
              Limits:
                Cpu:     20m
                Memory:  64Mi
              Requests:
                Cpu:     20m
                Memory:  64Mi
Status:
  Allocated Replicas:  1
  Ready Replicas:      2
  Replicas:            3
  Reserved Replicas:   0
Events:
  Type    Reason                 Age   From              Message
  ----    ------                 ----  ----              -------
  Normal  CreatingGameServerSet  96s   fleet-controller  Created GameServerSet simple-game-server-zspq6
  Normal  ScalingGameServerSet   32s   fleet-controller  Scaling active GameServerSet simple-game-server-zspq6 from 2 to 3

Is it possible your test game server is crashing for another reason?

If you https://agones.dev/site/docs/guides/troubleshooting/#introspect-with-kubernetes-tooling do you see anything in the event stream for either the Pod of the GameServer for the third GameServer?

github-actions[bot] commented 1 month ago

'This issue is marked as Stale due to inactivity for more than 30 days. To avoid being marked as 'stale' please add 'awaiting-maintainer' label or add a comment. Thank you for your contributions '

github-actions[bot] commented 3 weeks ago

This issue is marked as obsolete due to inactivity for last 60 days. To avoid issue getting closed in next 30 days, please add a comment or add 'awaiting-maintainer' label. Thank you for your contributions

googleforgames / agones

If a fleet has replicas specified, the fleet allocator will fail to scale the fleet to more than N replicas specified in the fleet spec. #3356