googleforgames / agones

Dedicated Game Server Hosting and Scaling for Multiplayer Games on Kubernetes
https://agones.dev
Apache License 2.0
6.1k stars 812 forks source link

[help wanted] Agones sidecar health check failed #3724

Closed Lahamc closed 7 months ago

Lahamc commented 7 months ago

I am using kubernetes in minikube with agones installed by helm.

Everytime I started my gameserver, the state of my gs turned unhealthy. After I traced the log, I found that agones-gameserver-sidecar keep saying game server health check failed. Therefore, I set my game server health check to disabled. Then my gameserver stuck at scheduled state.

Can anyone tell me what is happening to my game server?

My game server yaml file:

apiVersion: "agones.dev/v1"
kind: GameServer
metadata:
  name: agones-dev-game-server
spec:
  # replicas: 1
  # template:
  #   spec:
      health:
        # initialDelaySeconds: 15
        # periodSeconds: 10
        # failureThreshold: 10
      sdkServer:
        logLevel: Debug
      # ports:
      #   - name: default
      #     containerPort: 7654
      template:
        spec:
          containers:
          - name: agones-dev-game-server
            image: {my_image}
            imagePullPolicy: IfNotPresent
            ports:
              - containerPort: 2567
                protocol: TCP
            resources:
              requests:
                memory: "1Gi"
                cpu: "500m"
              limits:
                memory: "2Gi"
                cpu: "500m"

I can reproduce this issue using image gcr.io/agones-images/xonotic-example:0.5.

apiVersion: "agones.dev/v1"
kind: GameServer
metadata:
  name: agones-dev-game-server
spec:
  # replicas: 1
  # template:
  #   spec:
      health:
        # initialDelaySeconds: 15
        # periodSeconds: 10
        # failureThreshold: 10
      sdkServer:
        logLevel: Debug
      # ports:
      #   - name: default
      #     containerPort: 7654
      template:
        spec:
          containers:
          - name: agones-dev-game-server
            image: gcr.io/agones-images/xonotic-example:0.5
            imagePullPolicy: IfNotPresent
            # ports:
            #   - containerPort: 2567
            #     protocol: TCP
            resources:
              requests:
                memory: "1Gi"
                cpu: "500m"
              limits:
                memory: "2Gi"
                cpu: "500m"
PS C:\WINDOWS\system32> kubectl version
Client Version: v1.29.1
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.28.3
PS C:\WINDOWS\system32> helm ls -n agones-system
NAME            NAMESPACE       REVISION        UPDATED                                 STATUS          CHART           APP VERSION
my-release      agones-system   1               2024-03-22 09:44:00.6448432 +0900 JST   deployed        agones-1.39.0   1.39.0
PS C:\WINDOWS\system32> kubectl describe gs agones-dev-game-server
Name:         agones-dev-game-server
Namespace:    default
Labels:       <none>
Annotations:  agones.dev/sdk-version: 1.39.0
API Version:  agones.dev/v1
Kind:         GameServer
Metadata:
  Creation Timestamp:  2024-03-22T00:46:51Z
  Finalizers:
    agones.dev
  Generation:        4
  Resource Version:  1120
  UID:               9524b26c-0aea-42cf-890f-d84a86f49665
Spec:
  Container:  agones-dev-game-server
  Eviction:
    Safe:  Never
  Health:
    Failure Threshold:      3
    Initial Delay Seconds:  5
    Period Seconds:         5
  Immutable Replicas:       1
  Scheduling:               Packed
  Sdk Server:
    Grpc Port:  9357
    Http Port:  9358
    Log Level:  Debug
  Template:
    Metadata:
      Creation Timestamp:  <nil>
    Spec:
      Containers:
        Image:              {my_image}
        Image Pull Policy:  IfNotPresent
        Name:               agones-dev-game-server
        Ports:
          Container Port:  2567
          Protocol:        TCP
        Resources:
          Limits:
            Cpu:     500m
            Memory:  2Gi
          Requests:
            Cpu:     500m
            Memory:  1Gi
Status:
  Address:  192.168.212.84
  Addresses:
    Address:  192.168.212.84
    Type:     InternalIP
    Address:  minikube
    Type:     Hostname
  Eviction:
    Safe:              Never
  Immutable Replicas:  1
  Node Name:           minikube
  Players:             <nil>
  Ports:
  Reserved Until:  <nil>
  State:           Unhealthy
Events:
  Type     Reason     Age   From                   Message
  ----     ------     ----  ----                   -------
  Normal   Creating   16m   gameserver-controller  Pod agones-dev-game-server created
  Normal   Scheduled  16m   gameserver-controller  Address and port populated
  Warning  Unhealthy  15m   gameserver-sidecar     Health check failure
PS C:\WINDOWS\system32> kubectl describe pod agones-dev-game-server
Name:             agones-dev-game-server
Namespace:        default
Priority:         0
Service Account:  agones-sdk
Node:             minikube/192.168.212.84
Start Time:       Fri, 22 Mar 2024 09:46:51 +0900
Labels:           agones.dev/gameserver=agones-dev-game-server
                  agones.dev/role=gameserver
                  agones.dev/safe-to-evict=false
Annotations:      agones.dev/container: agones-dev-game-server
                  agones.dev/sdk-version: 1.39.0
                  cluster-autoscaler.kubernetes.io/safe-to-evict: false
Status:           Running
IP:               10.244.0.12
IPs:
  IP:           10.244.0.12
Controlled By:  GameServer/agones-dev-game-server
Containers:
  agones-gameserver-sidecar:
    Container ID:  docker://c32c7d1d48f93f113cf9d4aadad15c9aadf6a7d06b49a7a80d837f748adaffb9
    Image:         us-docker.pkg.dev/agones-images/release/agones-sdk:1.39.0
    Image ID:      docker-pullable://us-docker.pkg.dev/agones-images/release/agones-sdk@sha256:58c704a5c0265c46e507575cbc5597f4f9e4c30a2233395e7c9c2b606b3d7dd8
    Port:          <none>
    Host Port:     <none>
    Args:
      --grpc-port=9357
      --http-port=9358
    State:          Running
      Started:      Fri, 22 Mar 2024 09:47:04 +0900
    Ready:          True
    Restart Count:  0
    Requests:
      cpu:     30m
    Liveness:  http-get http://:8080/healthz delay=3s timeout=1s period=3s #success=1 #failure=3
    Environment:
      GAMESERVER_NAME:  agones-dev-game-server
      POD_NAMESPACE:    default (v1:metadata.namespace)
      FEATURE_GATES:    CountsAndLists=false&DisableResyncOnSDKServer=false&Example=false&FleetAllocationOverflow=true&GKEAutopilotExtendedDurationPods=false&PlayerAllocationFilter=false&PlayerTracking=false
      LOG_LEVEL:        Debug
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-fgq7t (ro)
  agones-dev-game-server:
    Container ID:   docker://085f8374071647ee240c54655971432c7313297c2f4ab86210012692e0ed85ed
    Image:          {my_image}
    Image ID:       docker-pullable://{my_image}@sha256:626014529398a363ccd92537ba2693ed4c7868fa62d5f36f8444a85f3b05e185
    Port:           2567/TCP
    Host Port:      0/TCP
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    137
      Started:      Fri, 22 Mar 2024 10:03:51 +0900
      Finished:     Fri, 22 Mar 2024 10:04:36 +0900
    Ready:          False
    Restart Count:  9
    Limits:
      cpu:     500m
      memory:  2Gi
    Requests:
      cpu:     500m
      memory:  1Gi
    Liveness:  http-get http://:8080/gshealthz delay=5s timeout=1s period=5s #success=1 #failure=3
    Environment:
      AGONES_SDK_GRPC_PORT:  9357
      AGONES_SDK_HTTP_PORT:  9358
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from empty (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  empty:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  kube-api-access-fgq7t:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                   From               Message
  ----     ------     ----                  ----               -------
  Normal   Scheduled  18m                   default-scheduler  Successfully assigned default/agones-dev-game-server to minikube
  Normal   Pulling    18m                   kubelet            Pulling image "us-docker.pkg.dev/agones-images/release/agones-sdk:1.39.0"
  Normal   Pulled     18m                   kubelet            Successfully pulled image "us-docker.pkg.dev/agones-images/release/agones-sdk:1.39.0" in 12.401s (12.401s including waiting)
  Normal   Created    18m                   kubelet            Created container agones-gameserver-sidecar
  Normal   Started    18m                   kubelet            Started container agones-gameserver-sidecar
  Normal   Pulling    18m                   kubelet            Pulling image "{my_image}"
  Normal   Pulled     18m                   kubelet            Successfully pulled image "{my_image}" in 22.118s (22.118s including waiting)
  Normal   Killing    16m (x2 over 17m)     kubelet            Container agones-dev-game-server failed liveness probe, will be restarted
  Normal   Created    16m (x3 over 18m)     kubelet            Created container agones-dev-game-server
  Normal   Started    16m (x3 over 18m)     kubelet            Started container agones-dev-game-server
  Normal   Pulled     16m (x2 over 17m)     kubelet            Container image "{my_image}" already present on machine
  Warning  Unhealthy  16m (x9 over 17m)     kubelet            Liveness probe failed: HTTP probe failed with statuscode: 500
  Warning  BackOff    3m39s (x43 over 13m)  kubelet            Back-off restarting failed container agones-dev-game-server in pod agones-dev-game-server_default(70d69657-0050-46a1-b465-58eac5b80fce)
PS C:\WINDOWS\system32> kubectl logs agones-dev-game-server -c agones-gameserver-sidecar
{"error":"not a valid logrus Level: \"\"","message":"Invalid LOG_LEVEL value. Defaulting to 'info'.","severity":"warning","time":"2024-03-22T00:47:04.396382763Z"}
{"ctlConf":{"GameServerName":"agones-dev-game-server","PodNamespace":"default","Address":"localhost","IsLocal":false,"LocalFile":"","Delay":0,"Timeout":0,"Test":"","TestSdkName":"","KubeConfig":"","GracefulTermination":true,"GRPCPort":9357,"HTTPPort":9358,"LogLevel":""},"featureGates":"CountsAndLists=false\u0026DisableResyncOnSDKServer=false\u0026Example=false\u0026FleetAllocationOverflow=true\u0026GKEAutopilotExtendedDurationPods=false\u0026PlayerAllocationFilter=false\u0026PlayerTracking=false","message":"Starting sdk sidecar","severity":"info","source":"main","time":"2024-03-22T00:47:04.396461063Z","version":"1.39.0"}
{"gsKey":"default/agones-dev-game-server","message":"Created GameServer sidecar","severity":"info","source":"*sdkserver.SDKServer","time":"2024-03-22T00:47:04.396809264Z"}
{"gsKey":"default/agones-dev-game-server","message":"Connection to Kubernetes service established","severity":"info","source":"*sdkserver.SDKServer","time":"2024-03-22T00:47:04.402631976Z","try":0}
{"grpcEndpoint":"localhost:9357","message":"Starting SDKServer grpc service...","severity":"info","source":"main","time":"2024-03-22T00:47:04.402949477Z"}
{"httpEndpoint":"localhost:9358","message":"Starting SDKServer grpc-gateway...","severity":"info","source":"main","time":"2024-03-22T00:47:04.403340678Z"}
{"gsKey":"default/agones-dev-game-server","message":"Starting workers...","queue":"agones.dev.default.agones-dev-game-server","severity":"info","source":"*sdkserver.SDKServer","time":"2024-03-22T00:47:04.502806195Z","workers":1}
{"failureCount":1,"gsKey":"default/agones-dev-game-server","message":"GameServer Health Check failed","severity":"warning","source":"*sdkserver.SDKServer","time":"2024-03-22T00:47:31.607250873Z"}
{"failureCount":2,"gsKey":"default/agones-dev-game-server","message":"GameServer Health Check failed","severity":"warning","source":"*sdkserver.SDKServer","time":"2024-03-22T00:47:36.607576872Z"}
{"failureCount":3,"gsKey":"default/agones-dev-game-server","message":"GameServer Health Check failed","severity":"warning","source":"*sdkserver.SDKServer","time":"2024-03-22T00:47:41.608487671Z"}
{"gameServerName":"agones-dev-game-server","gsKey":"default/agones-dev-game-server","message":"GameServer has failed health check","severity":"warning","source":"*sdkserver.SDKServer","time":"2024-03-22T00:47:41.608522171Z"}
{"failureCount":4,"gsKey":"default/agones-dev-game-server","message":"GameServer Health Check failed","severity":"warning","source":"*sdkserver.SDKServer","time":"2024-03-22T00:47:46.60920877Z"}
{"gameServerName":"agones-dev-game-server","gsKey":"default/agones-dev-game-server","message":"GameServer has failed health check","severity":"warning","source":"*sdkserver.SDKServer","time":"2024-03-22T00:47:46.60923767Z"}
{"failureCount":5,"gsKey":"default/agones-dev-game-server","message":"GameServer Health Check failed","severity":"warning","source":"*sdkserver.SDKServer","time":"2024-03-22T00:47:51.609446067Z"}
{"gameServerName":"agones-dev-game-server","gsKey":"default/agones-dev-game-server","message":"GameServer has failed health check","severity":"warning","source":"*sdkserver.SDKServer","time":"2024-03-22T00:47:51.609475867Z"}
PS C:\WINDOWS\system32> kubectl logs agones-dev-game-server -c agones-dev-game-server

> my-app@1.0.0 start
> tsx watch src/index.ts

1:25:13 AM [tsx] rerunning
c✅ .env.development loaded.
Seettings load completed eventId:  vmp202402003 space id:  3001
✅ Express initialized

       ___      _
      / __\___ | |_   _ ___  ___ _   _ ___
     / /  / _ \| | | | / __|/ _ \ | | / __|
    / /__| (_) | | |_| \__ \  __/ |_| \__ \
    \____/\___/|_|\__, |___/\___|\__,_|___/
                  |___/

Multiplayer Framework for Node.js · Open-source

💖 Sponsor Colyseus on GitHub → https://github.com/sponsors/endel
🌟 Give it a star on GitHub → https://github.com/colyseus/colyseus
☁️  Deploy and scale your project on Colyseus Cloud → https://cloud.colyseus.io

⚔️  Listening on http://localhost:2567
ashutosji commented 7 months ago

I have tested it on Debian rodete (amd64) with minikube v1.32.0. Installed the Agones using helm (https://agones.dev/site/docs/installation/install-agones/helm/#installing-the-chart). Also Created GameServer using this link kubectl create -f https://raw.githubusercontent.com/googleforgames/agones/release-1.39.0/examples/xonotic/gameserver.yaml

Everything worked perfectly fine for me. Refer this link for complete GameServer Specifications https://agones.dev/site/docs/reference/gameserver/. Checkout the Xonotic gameserver yaml https://github.com/googleforgames/agones/blob/main/examples/xonotic/gameserver.yaml Also try with default portPolicy and protocol?

Lahamc commented 7 months ago

I have tested it on Debian rodete (amd64) with minikube v1.32.0. Installed the Agones using helm (https://agones.dev/site/docs/installation/install-agones/helm/#installing-the-chart). Also Created GameServer using this link kubectl create -f https://raw.githubusercontent.com/googleforgames/agones/release-1.39.0/examples/xonotic/gameserver.yaml

Everything worked perfectly fine for me. Refer this link for complete GameServer Specifications https://agones.dev/site/docs/reference/gameserver/. Checkout the Xonotic gameserver yaml https://github.com/googleforgames/agones/blob/main/examples/xonotic/gameserver.yaml Also try with default portPolicy and protocol?

For the game server xonotic:1.8, it also worked perfectly fine for me. My problem is that I am using my own game server image and the error that I encountered is very similar to the error that I found in gcr.io/agones-images/xonotic-example:0.5 image. Therefore, I wanna know what is probably happening to my game server. Actually I am using node js for my game server. Am I supposed to call SDK.health() and SDK.ready() in my game server even though there is a agones-gameserver-sidecar in my kubernetes pod?

Lahamc commented 7 months ago

I have tested it on Debian rodete (amd64) with minikube v1.32.0. Installed the Agones using helm (https://agones.dev/site/docs/installation/install-agones/helm/#installing-the-chart). Also Created GameServer using this link kubectl create -f https://raw.githubusercontent.com/googleforgames/agones/release-1.39.0/examples/xonotic/gameserver.yaml Everything worked perfectly fine for me. Refer this link for complete GameServer Specifications https://agones.dev/site/docs/reference/gameserver/. Checkout the Xonotic gameserver yaml https://github.com/googleforgames/agones/blob/main/examples/xonotic/gameserver.yaml Also try with default portPolicy and protocol?

For the game server xonotic:1.8, it also worked perfectly fine for me. My problem is that I am using my own game server image and the error that I encountered is very similar to the error that I found in gcr.io/agones-images/xonotic-example:0.5 image. Therefore, I wanna know what is probably happening to my game server. Actually I am using node js for my game server. Am I supposed to call SDK.health() and SDK.ready() in my game server even though there is a agones-gameserver-sidecar in my kubernetes pod?

It was my mistake. Misunderstood the concept of agones. I should call the SDK.health() and ready() in my gameserver. It is working now. Thank you so much for the reply.