googleforgames / agones

Dedicated Game Server Hosting and Scaling for Multiplayer Games on Kubernetes
https://agones.dev
Apache License 2.0
6.04k stars 801 forks source link

GameServer stuck on state Scheduled when Pod failed with reason OutOfpods #2683

Closed katsew closed 4 months ago

katsew commented 2 years ago

What happened:

Agones didn't create a new Pod when a Pod failed due to reasons OutOfpods, and the GameServer stuck with state Scheduled.

What you expected to happen:

GameServer is expected to create a new Pod if a Pod fails due to reasons of OutOfpods.

How to reproduce it (as minimally and precisely as possible):

  1. Put the following manifest in /etc/kubernetes/manifests/static-pod.manifest of the testing node.
apiVersion: v1
kind: Pod
metadata:
  name: nginx
  namespace: kube-system
  labels:
    component: nginx
    tier: node
spec:
  hostNetwork: true
  containers:
  - name: nginx
    image: nginx:1.14.2
    imagePullPolicy: IfNotPresent
    ports:
    - containerPort: 80
    resources:
      requests:
        cpu: 100m
  priorityClassName: system-node-critical
  priority: 2000001000
  tolerations:
  - effect: NoExecute
    operator: Exists
  - effect: NoSchedule
    operator: Exists
  1. Set Fleet replicas to pod capacity of the node
  2. Confirm some of the gameserver pods stuck with state Pending.
  3. Forcibly delete static-pod created from step (1) in kube-system.
    • kubectl delete pod --force --grace-period=0 <static-pod-name> -n kube-system

All gameserver pods stuck with state Pending become failed with reason OutOfpods.

Anything else we need to know?:

Here is the Pod status that I reproduce.

status:
  message: 'Pod Node didn''t have enough resource: pods, requested: 1, used: 32, capacity:
    32'
  phase: Failed
  reason: OutOfpods

I created the Fleet from official document.

Environment:

markmandel commented 2 years ago

GameServer is expected to create a new Pod if a Pod fails due to reasons of OutOfpods.

Sorry, but maybe I'm missing something - but how is Agones supposed to create a new Pod if there isn't room in the cluster?

Or do you mean that Agones doesn't recover when there should be room?

markmandel commented 2 years ago

I'm also concerned that if you delete all the pods in the kube-system namespace, you are also breaking Kubernetes.

katsew commented 2 years ago

GameServer is expected to create a new Pod if a Pod fails due to reasons of OutOfpods.

Sorry, but maybe I'm missing something - but how is Agones supposed to create a new Pod if there isn't room in the cluster?

Or do you mean that Agones doesn't recover when there should be room?

I mean that Agones doesn't recover when there should be room.

katsew commented 2 years ago

I'm also concerned that if you delete all the pods in the kube-system namespace, you are also breaking Kubernetes.

Usually, if a Node has exceeded the Pod's capacity and there are no other Nodes that can be scheduled, the Pod's status is stuck at Pending. Since failure by OutOfpods is rare, it was likely necessary to put the cluster in a broken state in order to reproduce it. As far as I could tell, simply deleting the GameServer did not reproduce it.

katsew commented 2 years ago

I have updated the reproduction procedure to be more accurate. By forcibly deleting pods, it can be reproduced in one attempt.

markmandel commented 2 years ago

I'm curious - what does kubectl delete pod --force --grace-period=0 --all -n kube-system force to happen?

Is that a required step to replicate the issue?

markmandel commented 2 years ago

Also, can you share a kubectl describe of the Pod that has failed please as well?

It sounds like we should move the GameServer to Unhealthy if the backing pod moves to an OutOfpods state, but I'm just trying to nail down exactly what is happening here.

katsew commented 2 years ago

I'm curious - what does kubectl delete pod --force --grace-period=0 --all -n kube-system force to happen?

Is that a required step to replicate the issue?

I tried forcibly deleting only the GameServer pods, but still could not reproduce the problem. So I checked which kube-system component is actually causing this OutOfpods. As a result, kube-proxy is causing this problem.

I have updated the confirmed reproduction method.

katsew commented 2 years ago

Also, can you share a kubectl describe of the Pod that has failed please as well?

Sure, here it is.

Name:           simple-game-server-qxtcq-wbs6p
Namespace:      default
Priority:       0
Node:           gke-friday-developme-gameserver-pool--f788a1c2-szxx/
Start Time:     Thu, 28 Jul 2022 03:15:22 +0000
Labels:         agones.dev/gameserver=simple-game-server-qxtcq-wbs6p
                agones.dev/role=gameserver
Annotations:    agones.dev/container: simple-game-server
                agones.dev/sdk-version: 1.20.0
                cluster-autoscaler.kubernetes.io/safe-to-evict: false
Status:         Failed
Reason:         OutOfpods
Message:        Pod Node didn't have enough resource: pods, requested: 1, used: 32, capacity: 32
IP:
IPs:            <none>
Controlled By:  GameServer/simple-game-server-qxtcq-wbs6p
Containers:
  agones-gameserver-sidecar:
    Image:      gcr.io/agones-images/agones-sdk:1.20.0
    Port:       <none>
    Host Port:  <none>
    Args:
      --grpc-port=9357
      --http-port=9358
    Requests:
      cpu:     30m
    Liveness:  http-get http://:8080/healthz delay=3s timeout=1s period=3s #success=1 #failure=3
    Environment:
      GAMESERVER_NAME:  simple-game-server-qxtcq-wbs6p
      POD_NAMESPACE:    default (v1:metadata.namespace)
      FEATURE_GATES:    CustomFasSyncInterval=false&Example=true&NodeExternalDNS=true&PlayerAllocationFilter=false&PlayerTracking=false&SDKGracefulTermination=false&StateAllocationFilter=false
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-s4rzz (ro)
  simple-game-server:
    Image:      gcr.io/agones-images/simple-game-server:0.13
    Port:       7654/UDP
    Host Port:  7258/UDP
    Requests:
      cpu:     0
      memory:  0
    Liveness:  http-get http://:8080/gshealthz delay=5s timeout=1s period=5s #success=1 #failure=3
    Environment:
      AGONES_SDK_GRPC_PORT:  9357
      AGONES_SDK_HTTP_PORT:  9358
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from empty (ro)
Volumes:
  empty:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  kube-api-access-s4rzz:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason             Age    From                Message
  ----     ------             ----   ----                -------
  Warning  FailedScheduling   2m44s  default-scheduler   0/1 nodes are available: 1 Too many pods.
  Warning  FailedScheduling   2m42s  default-scheduler   0/1 nodes are available: 1 Too many pods.
  Normal   Scheduled          43s    default-scheduler   Successfully assigned default/simple-game-server-qxtcq-wbs6p to gke-friday-developme-gameserver-pool--f788a1c2-szxx
  Normal   NotTriggerScaleUp  2m44s  cluster-autoscaler  pod didn't trigger scale-up:
  Warning  OutOfpods          44s    kubelet             Node didn't have enough resource: pods, requested: 1, used: 32, capacity: 32
markmandel commented 2 years ago

Can you reproduce the issue without actively deleting Kubernetes components?

katsew commented 2 years ago

Today I tried to reproduce without forcibly deleting kube-proxy, but I couldn't.

First, I deploy static pod which has similar spec with kube-proxy pod, then delete forcibly, and didn't work.

Here is the set of specs I try to align with kube-proxy.

The actual Pod resource is this:

apiVersion: v1
kind: Pod
metadata:
  name: nginx
  namespace: kube-system
  labels:
    component: nginx
    tier: node
  ownerReferences:
  - apiVersion: v1
    controller: true
    kind: Node
    name: gke-friday-developme-gameserver-pool--f788a1c2-pdj1
    uid: 37d22a61-6e19-4729-bf3e-86a8823c9215
spec:
  containers:
  - name: nginx
    image: nginx:1.14.2
    imagePullPolicy: IfNotPresent
    ports:
    - containerPort: 80
    resources:
      requests:
        cpu: 100m
  nodeName: gke-friday-developme-gameserver-pool--f788a1c2-pdj1
  priorityClassName: system-node-critical
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    operator: Exists
  - effect: NoSchedule
    operator: Exists

Next, I tried setting pods to 0 in ResourceQuota and forcibly deleting GameServer, which also did not work.

# deploy to the same namespace as GameServers exist
apiVersion: v1
kind: ResourceQuota
metadata:
  name: pod-counts
  namespace: default
spec:
  hard:
    pods: "0"
katsew commented 2 years ago

Here is the log for kubelet.

Jul 29 01:42:42 gke-friday-developme-gameserver-pool--f788a1c2-pdj1 kubelet[14147]: I0729 01:42:42.860377   14147 kubelet.go:1950] "SyncLoop DELETE" source="api" pods=[kube-system/kube-proxy-gke-friday-developme-gameserver-pool--f788a1c2-pdj1]
Jul 29 01:42:42 gke-friday-developme-gameserver-pool--f788a1c2-pdj1 kubelet[14147]: I0729 01:42:42.860644   14147 kubelet_pods.go:1520] "Generating pod status" pod="kube-system/kube-proxy-gke-friday-developme-gameserver-pool--f788a1c2-pdj1"
Jul 29 01:42:42 gke-friday-developme-gameserver-pool--f788a1c2-pdj1 kubelet[14147]: I0729 01:42:42.861193   14147 kubelet.go:1668] "Trying to delete pod" pod="kube-system/kube-proxy-gke-friday-developme-gameserver-pool--f788a1c2-pdj1" podUID=9c095625-5113-476e-a638-bc8a78a9271b
Jul 29 01:42:42 gke-friday-developme-gameserver-pool--f788a1c2-pdj1 kubelet[14147]: I0729 01:42:42.861335   14147 mirror_client.go:125] "Deleting a mirror pod" pod="kube-system/kube-proxy-gke-friday-developme-gameserver-pool--f788a1c2-pdj1" podUID=0xc000b4b060
Jul 29 01:42:42 gke-friday-developme-gameserver-pool--f788a1c2-pdj1 kubelet[14147]: I0729 01:42:42.872105   14147 config.go:278] "Setting pods for source" source="api"
Jul 29 01:42:42 gke-friday-developme-gameserver-pool--f788a1c2-pdj1 kubelet[14147]: I0729 01:42:42.873388   14147 kubelet.go:1944] "SyncLoop REMOVE" source="api" pods=[kube-system/kube-proxy-gke-friday-developme-gameserver-pool--f788a1c2-pdj1]
Jul 29 01:42:42 gke-friday-developme-gameserver-pool--f788a1c2-pdj1 kubelet[14147]: I0729 01:42:42.883964   14147 config.go:278] "Setting pods for source" source="api"
Jul 29 01:42:42 gke-friday-developme-gameserver-pool--f788a1c2-pdj1 kubelet[14147]: I0729 01:42:42.884289   14147 config.go:383] "Receiving a new pod" pod="default/simple-game-server-psqqb-hnz28"
Jul 29 01:42:42 gke-friday-developme-gameserver-pool--f788a1c2-pdj1 kubelet[14147]: I0729 01:42:42.887060   14147 volume_manager.go:394] "Waiting for volumes to attach and mount for pod" pod="kube-system/kube-proxy-gke-friday-developme-gameserver-pool--f788a1c2-pdj1"
Jul 29 01:42:42 gke-friday-developme-gameserver-pool--f788a1c2-pdj1 kubelet[14147]: I0729 01:42:42.887324   14147 volume_manager.go:425] "All volumes are attached and mounted for pod" pod="kube-system/kube-proxy-gke-friday-developme-gameserver-pool--f788a1c2-pdj1"
Jul 29 01:42:42 gke-friday-developme-gameserver-pool--f788a1c2-pdj1 kubelet[14147]: I0729 01:42:42.887937   14147 kuberuntime_manager.go:711] "computePodActions got for pod" podActions={KillPod:false CreateSandbox:false SandboxID:2d3ba683c2accef9b8e33e0d44ff5602267cb2b4645f5cce78d4f04ff6c20c2a Attempt:0 NextInitContainerToStart:nil ContainersToStart:[] ContainersToKill:map[] EphemeralContainersToStart:[]} pod="kube-system/kube-proxy-gke-friday-developme-gameserver-pool--f788a1c2-pdj1"
Jul 29 01:42:42 gke-friday-developme-gameserver-pool--f788a1c2-pdj1 kubelet[14147]: I0729 01:42:42.888478   14147 kubelet.go:1934] "SyncLoop ADD" source="api" pods=[default/simple-game-server-psqqb-hnz28]
Jul 29 01:42:42 gke-friday-developme-gameserver-pool--f788a1c2-pdj1 kubelet[14147]: I0729 01:42:42.888680   14147 topology_manager.go:187] "Topology Admit Handler"
Jul 29 01:42:42 gke-friday-developme-gameserver-pool--f788a1c2-pdj1 kubelet[14147]: I0729 01:42:42.889513   14147 predicate.go:143] "Predicate failed on Pod" pod="simple-game-server-psqqb-hnz28_default(34520514-3a31-4fc2-a039-5727211e7f4b)" err="Node didn't have enough resource: pods, requested: 1, used: 32, capacity: 32"
Jul 29 01:42:42 gke-friday-developme-gameserver-pool--f788a1c2-pdj1 kubelet[14147]: I0729 01:42:42.890657   14147 event.go:291] "Event occurred" object="default/simple-game-server-psqqb-hnz28" kind="Pod" apiVersion="v1" type="Warning" reason="OutOfpods" message="Node didn't have enough resource: pods, requested: 1, used: 32, capacity: 32"
Jul 29 01:42:42 gke-friday-developme-gameserver-pool--f788a1c2-pdj1 kubelet[14147]: I0729 01:42:42.922069   14147 status_manager.go:586] "Patch status for pod" pod="default/simple-game-server-psqqb-hnz28" patchBytes="{\"metadata\":{\"uid\":\"34520514-3a31-4fc2-a039-5727211e7f4b\"},\"status\":{\"conditions\":null,\"message\":\"Pod Node didn't have enough resource: pods, requested: 1, used: 32, capacity: 32\",\"phase\":\"Failed\",\"qosClass\":null,\"reason\":\"OutOfpods\",\"startTime\":\"2022-07-29T01:42:42Z\"}}"
Jul 29 01:42:42 gke-friday-developme-gameserver-pool--f788a1c2-pdj1 kubelet[14147]: I0729 01:42:42.922330   14147 status_manager.go:595] "Status for pod updated successfully" pod="default/simple-game-server-psqqb-hnz28" statusVersion=1 status={Phase:Failed Conditions:[] Message:Pod Node didn't have enough resource: pods, requested: 1, used: 32, capacity: 32 Reason:OutOfpods NominatedNodeName: HostIP: PodIP: PodIPs:[] StartTime:2022-07-29 01:42:42 +0000 UTC InitContainerStatuses:[] ContainerStatuses:[] QOSClass: EphemeralContainerStatuses:[]}

Any Ideas to reproduce?

katsew commented 2 years ago

I have updated the reproduction step. I said I deployed a static pod in kube-system, but I actually deploy a bare pod, not a static pod. I created a static pod like this step, then the issue was reproduced without deleting kube-proxy.

markmandel commented 2 years ago

Maybe a silly question but - why would someone add that manifest to a node? (most people I would expect are on cloud providers and either (a) don't have access or (b) will have it overwritten pretty fast)

katsew commented 2 years ago

According to the documentation, Static Pods are supposed to be used by users to deploy their own control plane components, but I'm not sure if there are other use cases where users actually use them. Just to be clear, I used Static Pods only to reproduce this issue, so I did not use it in a real environment.

katsew commented 2 years ago

We're currently workaround this issue by running descheduler to evict Pods failed with Outofpods. However, it delays for a certain time period, since descheduler running with cronjob.

So, I would like to submit a PR that solves this issue, but I don't feel I can put a reproduction method into a test case... 😒

katsew commented 2 years ago

Today I encountered the same behavior as this issue when the Pod failed with OutOfcpu. It seems that when the Pod failed in some reasons, the GameServer does not recover automatically πŸ€”

katsew commented 2 years ago

@markmandel

I've been working on this issue and it seems that this issue is caused by the insufficient resource error in kubelet. https://github.com/kubernetes/kubernetes/blob/v1.21.12/pkg/kubelet/lifecycle/predicate.go#L140

My suggestion to fix this issue is that adding condition whether or not Pod is failed by insufficient resource error here. https://github.com/googleforgames/agones/blob/main/pkg/gameservers/health.go#L105

If it's ok to apply this fix, I'll submit a PR for this.

What do you think?

markmandel commented 2 years ago

I'll be honest, I'm still not understanding what the actual issue here is. It seems like you have to break Kubernetes to make it happen -- which doesn't sound like an actual bug, it sounds like an extreme edge case.

Also, a Pod in pending state is an indicator to the cluster autoscaler that it should expand the cluster - so changing that behaviour says to me that we should leave this alone.

If you can't replicate this issue without actually messing around with the underlying Kubernetes system, I'm not sure we should be considering a fix here?

katsew commented 2 years ago

I too am aware that I have hit an edge case rather than found a bug. We are currently downsize the entire cluster after work hours, and we get it back to where it was before work hours. I feel this makes it easier to step on edge cases.

Also, a Pod in pending state is an indicator to the cluster autoscaler that it should expand the cluster - so changing that behaviour says to me that we should leave this alone.

There may be some misunderstanding in this part. The problem is not the Pod in pending state, but in failed state. The GameServer does not recreate the backing Pod in failed state, GameServer never transition its state from Scheduled to another, even if cluster autoscaler scale out nodes.

This may prevents FleetAutoScaler from working properly, since GameServer stuck in Scheduled state and FleetAutoScaler cannot scale GameServer until manually delete the Pod in failed state.

If you can't replicate this issue without actually messing around with the underlying Kubernetes system, I'm not sure we should be considering a fix here?

Maybe I should ask the sig-node community for help to replicate the issue without killing static pods.

katsew commented 2 years ago

I don't know if it's worth mentioning, but a Pod created from Deployment will not get stuck in this state and become Running.
So my thought is that it would be nice if Pods created by GameServer could be recovered in the same way.

markmandel commented 2 years ago

Gotcha!

So ultimately it sounds like if a Pod status is Failed - we should handle that general case (less of an issue with OutOfPods, but more generally if a Pod is in a Failed state).

I wonder if there is an easy way to just create a Failed Pod somehow, and use that as our test case. It does sound like if a Pod has failed for any reason, it should be moved to Unhealthy anyway.

I had a quick look to see if there was an easy way to make that happen though. Did you have any luck with sig-node?

katsew commented 2 years ago

Thank you for straightening out the issue.

I wonder if there is an easy way to just create a Failed Pod somehow, and use that as our test case. It does sound like if a Pod has failed for any reason, it should be moved to Unhealthy anyway. I had a quick look to see if there was an easy way to make that happen though. Did you have any luck with sig-node?

Sorry, I've been too busy to ask the community for help yet, but I will do so soon.

katsew commented 2 years ago

@markmandel

Sorry, It takes too long to ask question. I've posted the question to k8s community slack channel, but couldn't get an answer. I've also tried to create a failed pod, but I have no idea to change pod's status.phase to Failed.

Have you found any way to create a Failed Pod?

markmandel commented 1 year ago

Looking at: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/

Failed: All containers in the Pod have terminated, and at least one container has terminated in failure. That is, the container either exited with non-zero status or was terminated by the system.

If this happens (assuming there is a Fleet in use), this will kick the GameServer into Unhealthy , and it will then get replaced with a new Pod.

...also, if we can't replicate the issue, is it an issue? πŸ˜„

katsew commented 1 year ago

@markmandel

I found a manifest to replicate the issue. We have to set restartPolicy to Never, then exit containers with non-zero status. To exit all containers with non-zero status, I have to add hostPID: true to see all container process id. I also found that exit containers with zero status will stuck on phase Succeeded. Should we handle this case, too? πŸ€”

apiVersion: "agones.dev/v1"
kind: Fleet
metadata:
  name: simple-game-server
  namespace: default
spec:
  replicas: 1
  template:
    spec:
      ports:
      - name: default
        containerPort: 7654
      template:
        spec:
          hostPID: true
          restartPolicy: Never
          containers:
          - name: simple-game-server
            image: gcr.io/agones-images/simple-game-server:0.13
            command:
              - sh
              - -cx
              - |
                pgrep -n sdk-server | xargs kill -9
                exit 1

Related issue: https://github.com/googleforgames/agones/issues/2361

unlightable commented 1 year ago

I would like to chime in here as it seems like it's the same issue.

There is a relatively fresh kubernetes feature - https://kubernetes.io/docs/concepts/architecture/nodes/#graceful-node-shutdown It seems like it can lead to pods being transitioned into Failed state too:

Status:           Failed
Reason:           Terminated
Message:          Pod was terminated in response to imminent node shutdown.

I think controlling GameServer should indeed be moved to unhealthy here as a correct reaction.

I don't have a concrete way to reproduce unfortunately, is we encountered an issue in production on a loaded cluster. But I can guess that this will happen if nodes shutdownGracePeriod and shutdownGracePeriodCriticalPods are not 0 (to enable the feature) but not enough to actually terminate containers inside the pod gracefully, due to them having bigger terminationGracePeriodSeconds and actually using it up.

markmandel commented 1 year ago

This is all kinds of fun, because of Pod restarts πŸ˜„ so I appreciate people digging in.

One thing I'm trying to work out from the documentation is, that if the Pod is in a Failed state, is that the final state?

I.e. do we know if a Pod could restart its way out of a Failed state? @roberthbailey @zmerlynn @gongmax @igooch any of you know? The docs don't seem clear to me.

https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-phase

zmerlynn commented 1 year ago

status.phase is a synthetic view really meant for humans. As pointed out in the link: The phase of a Pod is a simple, high-level summary of where the Pod is in its lifecycle. The phase is not intended to be a comprehensive rollup of observations of container or Pod state, nor is it intended to be a comprehensive state machine. Really, you can think of status.phase as aggregating the state of each container and potentially other Pod conditions.

That said, with restartPolicy: Never, I would expect Failed to be terminal (except for possibly some nuance around the state of the SDK sidecar container). It would be useful to have the full kubectl get -oyaml for the Pod in question rather than describe view, just to see.

markmandel commented 1 year ago

Since we run (by default) with restartPolicy: Always on the Pod, we have to assume there is a restart.

@unlightable in your situation, i assume once the node was torn down, the GameServer was replaced on another node?

unlightable commented 1 year ago

@unlightable in your situation, i assume once the node was torn down, the GameServer was replaced on another node?

Nope. Node is not teared down, merely rebooted. I'm guessing that removing it from the cluster completely could actually destroy the pod and "fix" everything, but can't confirm yet.

Pods do look like that though:

apiVersion: v1
kind: Pod
metadata:
  annotations:
    agones.dev/container: gameserver
    agones.dev/ready-container-id: docker://6b40d60782e000a35405845e48d2daf842c634eff8a90c47171b8d7a114fe50d
    agones.dev/sdk-version: 1.27.0
    cluster-autoscaler.kubernetes.io/safe-to-evict: "false"
  creationTimestamp: "2023-01-04T04:02:05Z"
  labels:
    agones.dev/gameserver: live-pgwc9-bjs2h
    agones.dev/role: gameserver
    app: DedicatedServer
  name: live-pgwc9-bjs2h
  namespace: live
  ownerReferences:
  - apiVersion: agones.dev/v1
    blockOwnerDeletion: true
    controller: true
    kind: GameServer
    name: live-pgwc9-bjs2h
    uid: d5ef5d46-e4d1-4fb5-99ca-7b41c04b06ca
  resourceVersion: "82992068"
  uid: 47922441-67f0-490c-946d-897ba9546ba4
spec:
  affinity:
    podAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - podAffinityTerm:
          labelSelector:
            matchLabels:
              agones.dev/role: gameserver
          topologyKey: kubernetes.io/hostname
        weight: 100
  containers:
  - args:
    - --grpc-port=9357
    - --http-port=9358
    env:
    - name: GAMESERVER_NAME
      value: live-pgwc9-bjs2h
    - name: POD_NAMESPACE
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: metadata.namespace
    - name: FEATURE_GATES
      value: CustomFasSyncInterval=true&Example=true&PlayerAllocationFilter=false&PlayerTracking=false&ResetMetricsOnDelete=false&SDKGracefulTermination=true&StateAllocationFilter=true
    image: gcr.io/agones-images/agones-sdk:1.27.0
    imagePullPolicy: IfNotPresent
    livenessProbe:
      failureThreshold: 3
      httpGet:
        path: /healthz
        port: 8080
        scheme: HTTP
      initialDelaySeconds: 3
      periodSeconds: 3
      successThreshold: 1
      timeoutSeconds: 1
    name: agones-gameserver-sidecar
    resources:
      requests:
        cpu: 30m
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-n95dr
      readOnly: true
  - command:
    - /app/DedicatedServer
    - --null-con
    - --enable-eac
    - --initial-nice
    - "10"
    - +exec
    - /var/xo-ds-config/config.cfg
    env:
    - name: AGONES_SDK_GRPC_PORT
      value: "9357"
    - name: AGONES_SDK_HTTP_PORT
      value: "9358"
    image: cr.yandex/crph7uvg1chcap6rvt9g/xo-gameserver:2.2.10.231067
    imagePullPolicy: IfNotPresent
    lifecycle:
      preStop:
        exec:
          command:
          - /bin/bash
          - /var/xo-ds-config/preStop.sh
    livenessProbe:
      failureThreshold: 3
      httpGet:
        path: /gshealthz
        port: 8080
        scheme: HTTP
      initialDelaySeconds: 10
      periodSeconds: 20
      successThreshold: 1
      timeoutSeconds: 1
    name: gameserver
    ports:
    - containerPort: 35000
      hostPort: 35063
      protocol: UDP
    resources:
      limits:
        cpu: "1"
        memory: 800Mi
      requests:
        cpu: 400m
        memory: 400Mi
    securityContext:
      capabilities:
        add:
        - SYS_NICE
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/xo-ds-config
      name: xo-ds-config
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-n95dr
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  nodeName: cl1drlo5quu5uo06e12s-onul
  nodeSelector:
    yandex.cloud/preemptible: "true"
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: xo-ds
  serviceAccountName: xo-ds
  terminationGracePeriodSeconds: 65
  tolerations:
  - key: preemptible
    value: "true"
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - configMap:
      defaultMode: 420
      name: xo-ds-config
    name: xo-ds-config
  - name: kube-api-access-n95dr
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2023-01-04T04:02:05Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2023-01-04T04:02:07Z"
    status: "True"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2023-01-04T04:02:07Z"
    status: "True"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2023-01-04T04:02:05Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: docker://5a0473a7366838d11cc2acce68b2c2865b80b9a595bc349c9d80ff59c9f5c591
    image: gcr.io/agones-images/agones-sdk:1.27.0
    imageID: docker-pullable://gcr.io/agones-images/agones-sdk@sha256:9e31ebde2abd1410a6e94dcd119b653070a162a27e8056601c5bbbb4f2b3e3e4
    lastState: {}
    name: agones-gameserver-sidecar
    ready: true
    restartCount: 0
    started: true
    state:
      running:
        startedAt: "2023-01-04T04:02:06Z"
  - containerID: docker://6b40d60782e000a35405845e48d2daf842c634eff8a90c47171b8d7a114fe50d
    image: cr.yandex/crph7uvg1chcap6rvt9g/xo-gameserver:2.2.10.231067
    imageID: docker-pullable://cr.yandex/crph7uvg1chcap6rvt9g/xo-gameserver@sha256:386ec73f9a85fcb483ee3a6b8c8e5f16f0366af9894bce2eb7c7ba01e637f38b
    lastState: {}
    name: gameserver
    ready: true
    restartCount: 0
    started: true
    state:
      running:
        startedAt: "2023-01-04T04:02:07Z"
  hostIP: 172.28.132.38
  message: Pod was terminated in response to imminent node shutdown.
  phase: Failed
  podIP: 10.112.227.19
  podIPs:
  - ip: 10.112.227.19
  qosClass: Burstable
  reason: Terminated
  startTime: "2023-01-04T04:02:05Z"

and associated GameServer

apiVersion: agones.dev/v1
kind: GameServer
metadata:
  annotations:
    agones.dev/last-allocated: "2023-01-04T04:02:17.689865428Z"
    agones.dev/ready-container-id: docker://6b40d60782e000a35405845e48d2daf842c634eff8a90c47171b8d7a114fe50d
    agones.dev/sdk-version: 1.27.0
  creationTimestamp: "2023-01-04T04:02:05Z"
  finalizers:
  - agones.dev
  generateName: live-pgwc9-
  generation: 7
  labels:
    agones.dev/fleet: live
    agones.dev/gameserverset: live-pgwc9
    version: 2.2.10.231067
  name: live-pgwc9-bjs2h
  namespace: live
  ownerReferences:
  - apiVersion: agones.dev/v1
    blockOwnerDeletion: true
    controller: true
    kind: GameServerSet
    name: live-pgwc9
    uid: 5947013d-9da6-44fd-bc65-e05c11901453
  resourceVersion: "82974474"
  uid: d5ef5d46-e4d1-4fb5-99ca-7b41c04b06ca
spec:
  container: gameserver
  health:
    failureThreshold: 3
    initialDelaySeconds: 10
    periodSeconds: 20
  ports:
  - container: gameserver
    containerPort: 35000
    hostPort: 35063
    name: default
    portPolicy: Dynamic
    protocol: UDP
  scheduling: Packed
  sdkServer:
    grpcPort: 9357
    httpPort: 9358
    logLevel: Info
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: DedicatedServer
    spec:
      containers:
      - command:
        - /app/DedicatedServer
        - --null-con
        - --enable-eac
        - --initial-nice
        - "10"
        - +exec
        - /var/xo-ds-config/config.cfg
        image: cr.yandex/crph7uvg1chcap6rvt9g/xo-gameserver:2.2.10.231067
        lifecycle:
          preStop:
            exec:
              command:
              - /bin/bash
              - /var/xo-ds-config/preStop.sh
        name: gameserver
        resources:
          limits:
            cpu: "1"
            memory: 800Mi
          requests:
            cpu: 400m
            memory: 400Mi
        securityContext:
          capabilities:
            add:
            - SYS_NICE
        volumeMounts:
        - mountPath: /var/xo-ds-config
          name: xo-ds-config
      nodeSelector:
        yandex.cloud/preemptible: "true"
      serviceAccountName: xo-ds
      terminationGracePeriodSeconds: 65
      tolerations:
      - key: preemptible
        value: "true"
      volumes:
      - configMap:
          name: xo-ds-config
        name: xo-ds-config
status:
  address: 158.160.13.221
  nodeName: cl1drlo5quu5uo06e12s-onul
  players: null
  ports:
  - name: default
    port: 35063
  reservedUntil: null
  state: Allocated
markmandel commented 1 year ago

Node is not teared down, merely rebooted. I'm guessing that removing it from the cluster completely could actually destroy the pod and "fix" everything, but can't confirm yet.

Maybe a silly question - but how is a node rebooted without deleting the Pods on it?

zmerlynn commented 1 year ago

Maybe a silly question - but how is a node rebooted without deleting the Pods on it?

Given Pod was terminated in response to imminent node shutdown., I would guess using something like the reboot command?

markmandel commented 1 year ago

No I mean, from the docs, it reads to me that Pods should all be shutdown as part of the reboot - so they should all go away on that node, taking the GameServers with them in the process.

So why is that not happening in this instance?

unlightable commented 1 year ago

Maybe a silly question - but how is a node rebooted without deleting the Pods on it?

So why is that not happening in this instance?

I've assumed a few comments ago that it happens due to graceful shutdown: https://github.com/googleforgames/agones/issues/2683#issuecomment-1367438706

Assuming that reboot needs to happen respecting some deadline, pods that were in shutdown hooks have to be moved into that special state?

Here is the logic doing that (I think): https://github.com/kubernetes/kubernetes/blob/37e73b419e455db34f5fe3e8d815418680ab23df/pkg/kubelet/nodeshutdown/nodeshutdown_manager_linux.go#L377

I even found some related issue through looking into e2e test for the graceful shutdown feature: https://github.com/kubernetes/kubernetes/issues/108594

markmandel commented 1 year ago

Assuming that reboot needs to happen respecting some deadline, pods that were in shutdown hooks have to be moved into that special state?

But doesn't the node eventually assume it can't gracefully shut everything down, do a force kill and delete all the Pods that way (and then reboot the node?)

I guess I'm leaning more towards "this is a bug in K8s" rather than an issue in Agones? πŸ€”

unlightable commented 1 year ago

But doesn't the node eventually assume it can't gracefully shut everything down, do a force kill and delete all the Pods that way (and then reboot the node?)

It kinda does, but moves those pods into that failed state instead. Probably to indicate that there were some issue with shutdown? IDK.

I guess I'm leaning more towards "this is a bug in K8s" rather than an issue in Agones? πŸ€”

Well maybe? But even if it is, don't you think GameServer should react somehow when controlled pod ends up is in some weird state indicating failure? It could be more logic/bugs in k8s that make it so, as we found out in this issue and related ones.

markmandel commented 1 year ago

But doesn't the node eventually assume it can't gracefully shut everything down, do a force kill and delete all the Pods that way (and then reboot the node?)

It kinda does, but moves those pods into that failed state instead. Probably to indicate that there were some issue with shutdown? IDK.

I guess I'm leaning more towards "this is a bug in K8s" rather than an issue in Agones? thinking

Well maybe? But even if it is, don't you think GameServer should react somehow when controlled pod ends up is in some weird state indicating failure? It could be more logic/bugs in k8s that make it so, as we found out in this issue and related ones.

Oh I 100% hear you - but it can be super hard for us to actually know "hey this is a really bad state that is unrecoverable" vs "this is a transitive state that will restart itself and then go away" - we do loads of hacks for this already because of how we respond to GameServers doing unhealthy things based on what state they are in (Ready etc).

So just to 100% check, did the node ever actually reboot, and if so, what happens to the Pod? Or did this state actually block the Node from restarting entirely?

unlightable commented 1 year ago

So just to 100% check, did the node ever actually reboot, and if so, what happens to the Pod? Or did this state actually block the Node from restarting entirely?

Node reboots, pods stay in Failed state, GameServer keeps being Allocated. No containers are alive afterwards (e.g. you can't kubectl exec ... or kubectl log ...).

We ended up rolling a job that reaps those pods eventually, but would be nice to not have to (:

markmandel commented 1 year ago

Node reboots, pods stay in Failed state

Oh weird!

GameServer keeps being Allocated. No containers are alive afterwards (e.g. you can't kubectl exec ... or kubectl log ...).

Eep, yeah, that's bad. I noticed that the Pod doesn't get a deletionTimestamp either, which.... sucks 😬

We ended up rolling a job that reaps those pods eventually, but would be nice to not have to (:

Yeah, that's fair enough. What criteria are you using specifically?

I wonder how Deployments and StatefulSets manage this πŸ€” if there is something we can steal from there. I assume they don't have this issue?

unlightable commented 1 year ago

Yeah, that's fair enough. What criteria are you using specifically?

We do no smart things and just look for Failed pods with our fleet labels as no workload could correctly be in that state for us.

I wonder how Deployments and StatefulSets manage this πŸ€” if there is something we can steal from there. I assume they don't have this issue?

I'll try to do some experiments after vacation, but afraid that no revelation awaits there. Deployments rely on pods count and readiness, so they will probably suffer from similar issue. Although there are some cases handled by https://github.com/kubernetes/kubernetes/blob/1d2e8042877c4facd3a45e911857f92474d64797/staging/src/k8s.io/api/extensions/v1beta1/types.go#L1000

StatefulSets even point out in the docs that sometimes human interaction is required to resolve the situation, so no holding breath here too.

markmandel commented 1 year ago

Sod. Well there goes those good ideas.

πŸ€” maybe another approach is better - something like "if the Pod has been in a failed state for <health periodSeconds * periodSeconds> we consider the whole thing defunct and then move it to Unhealthy.

Which makes a lot of sense really, since that's what the health check would likely be doing anyway.

How we work out it's Failed, and for how long is a different matter (event stream? maybe we track it ourselves?)

it's not an immediate move to Unhealthy, but it does allow the system to eventually self heal. WDYT?

unlightable commented 1 year ago

it's not an immediate move to Unhealthy, but it does allow the system to eventually self heal. WDYT?

I like it! As k8s authors themselves praise level-triggering over edges? And Agones seems to echo those ways by requiring regular health pushes onto sidecar. So I would expect GameServer to become unhealthy in some timeframe after those pushes begin to fail, be it due to actual game server crash or k8s deficiencies!

markmandel commented 1 year ago

Excellent! Now we just need to work out how to track this πŸ˜„ but I think we can do that oh yeah, and implement it πŸ˜„

unlightable commented 1 year ago

You mean track liveliness through heartbeats? I would have designed it super straightforwardly - sidecar regularly pings/touches GameServer and stores last ping timestamp. The issue here is it creates noticeable write load on k8s object storage, as GameServer changes frequently even while nothing happens? We can move that load onto Agones controller by making it the recipient of said pings and tracking when they've stopped.

Another way to deal with it is just a periodic query for pods belonging to controlled fleets from Agones and judging their liveliness by that dreaded Failed state. Which seems a bit more "special-case" rather than general solution.

markmandel commented 1 year ago

You mean track liveliness through heartbeats? I would have designed it super straightforwardly - sidecar regularly pings/touches GameServer and stores last ping timestamp.

Ah - but there is no sidecar, because the Pod is failed πŸ˜ƒ we can't guarantee the sidecar will be there.

I think we likely need to do this in the HealthController or create a new specific sub-controller for this.

Probably track pod updates, and look for failed state, and then cache it somewhere to be looked at again in <health periodSeconds * periodSeconds> to see if it's still the same.

alexey-pankratyev commented 1 year ago

@markmandel Tell me if it is possible to send the status to some matchmaking service when the game server crashes?

markmandel commented 1 year ago

@markmandel Tell me if it is possible to send the status to some matchmaking service when the game server crashes?

You would need to check through the k8s API: https://agones.dev/site/docs/guides/access-api/

github-actions[bot] commented 10 months ago

'This issue is marked as Stale due to inactivity for more than 30 days. To avoid being marked as 'stale' please add 'awaiting-maintainer' label or add a comment. Thank you for your contributions '

github-actions[bot] commented 8 months ago

This issue is marked as obsolete due to inactivity for last 60 days. To avoid issue getting closed in next 30 days, please add a comment or add 'awaiting-maintainer' label. Thank you for your contributions