Closed markmandel closed 2 months ago
Just looking in the cluster, I do see:
markmandel@cloudshell:~ (agones-images)$ kubectl get ns
NAME STATUS AGE
1711579203 Terminating 8d
1711696994 Terminating 7d10h
1711795766 Terminating 6d7h
1712076562 Terminating 3d1h
1712340564 Active 6m34s
agones-system Active 17d
default Active 43d
gke-gmp-system Active 43d
gke-managed-cim Active 43d
gke-managed-filestorecsi Active 43d
gke-managed-system Active 43d
gmp-public Active 43d
kube-node-lease Active 43d
kube-public Active 43d
kube-system Active 43d
...which is weird.
markmandel@cloudshell:~ (agones-images)$ kubectl describe ns 1711579203
Name: 1711579203
Labels: kubernetes.io/metadata.name=1711579203
owner=e2e-test
Annotations: <none>
Status: Terminating
Conditions:
Type Status LastTransitionTime Reason Message
---- ------ ------------------ ------ -------
NamespaceDeletionDiscoveryFailure False Fri, 05 Apr 2024 16:43:44 +0000 ResourcesDiscovered All resources successfully discovered
NamespaceDeletionGroupVersionParsingFailure False Wed, 27 Mar 2024 23:07:22 +0000 ParsedGroupVersions All legacy kube types successfully parsed
NamespaceDeletionContentFailure True Wed, 27 Mar 2024 23:07:22 +0000 ContentDeletionFailed Failed to delete all resource types, 1 remaining: unexpected items still remain in namespace: 1711579203 for gvr: /v1, Resource=pods
NamespaceContentRemaining True Wed, 27 Mar 2024 23:07:22 +0000 SomeResourcesRemain Some resources are remaining: gameservers.agones.dev has 12 resource instances, pods. has 12 resource instances
NamespaceFinalizersRemaining True Wed, 27 Mar 2024 23:07:22 +0000 SomeFinalizersRemain Some content in the namespace has finalizers remaining: agones.dev in 12 resource instances
markmandel@cloudshell:~ (agones-images)$ kubectl get pods -n 1711579203
NAME READY STATUS RESTARTS AGE
preferred-w4gnw-dkjjx-5s6bs 0/2 Terminating 0 8d
preferred-w4gnw-dkjjx-8h7wx 0/2 Terminating 0 8d
preferred-w4gnw-dkjjx-bkd7m 0/2 Terminating 0 8d
preferred-w4gnw-dkjjx-c4bdk 0/2 Terminating 0 8d
preferred-w4gnw-dkjjx-dj84f 0/2 Terminating 0 8d
preferred-w4gnw-dkjjx-gjhw7 0/2 Terminating 0 8d
preferred-w4gnw-dkjjx-npfvd 0/2 Terminating 0 8d
preferred-w4gnw-dkjjx-qnp8n 0/2 Terminating 0 8d
preferred-w4gnw-dkjjx-sl5qm 0/2 Terminating 0 8d
preferred-w4gnw-dkjjx-tmdqt 0/2 Terminating 0 8d
preferred-w4gnw-dkjjx-tx5wf 0/2 Terminating 0 8d
preferred-w4gnw-dkjjx-wcs2m 0/2 Terminating 0 8d
markmandel@cloudshell:~ (agones-images)$ kubectl get pod -n 1711579203 preferred-w4gnw-dkjjx-5s6bs -o yaml
apiVersion: v1
kind: Pod
metadata:
annotations:
agones.dev/container: game-server
agones.dev/ready-container-id: containerd://f39362db1ec461c97ae03e1ca5d20df5d946c107d425ee1dd20bdebac57ffc78
agones.dev/sdk-version: 1.40.0-dev-5c2e964
autopilot.gke.io/host-port-assignment: '{"min":7000,"max":8000,"portsAssigned":{"1":7240}}'
autopilot.gke.io/primary-container: game-server
autopilot.gke.io/resource-adjustment: '{"input":{"containers":[{"requests":{"cpu":"30m"},"name":"agones-gameserver-sidecar"},{"limits":{"cpu":"30m","memory":"32Mi"},"requests":{"cpu":"30m","memory":"32Mi"},"name":"game-server"}]},"output":{"containers":[{"limits":{"ephemeral-storage":"1Gi"},"requests":{"cpu":"30m","ephemeral-storage":"1Gi","memory":"2Gi"},"name":"agones-gameserver-sidecar"},{"limits":{"cpu":"470m","ephemeral-storage":"1Gi","memory":"62Mi"},"requests":{"cpu":"470m","ephemeral-storage":"1Gi","memory":"62Mi"},"name":"game-server"}]},"modified":true}'
autopilot.gke.io/warden-version: 2.9.20
cluster-autoscaler.kubernetes.io/safe-to-evict: "false"
creationTimestamp: "2024-03-27T22:54:37Z"
deletionGracePeriodSeconds: 30
deletionTimestamp: "2024-03-27T23:06:09Z"
labels:
agones.dev/gameserver: preferred-w4gnw-dkjjx-5s6bs
agones.dev/role: gameserver
agones.dev/safe-to-evict: "false"
name: preferred-w4gnw-dkjjx-5s6bs
namespace: "1711579203"
ownerReferences:
- apiVersion: agones.dev/v1
blockOwnerDeletion: true
controller: true
kind: GameServer
name: preferred-w4gnw-dkjjx-5s6bs
uid: ec61fed6-63c7-4c80-9a85-9e7dc70d5015
resourceVersion: "47363486"
uid: d608d658-758d-4a07-960b-7ada2526a6ad
spec:
affinity:
podAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- podAffinityTerm:
labelSelector:
matchLabels:
agones.dev/role: gameserver
topologyKey: kubernetes.io/hostname
weight: 100
containers:
- args:
- --grpc-port=9357
- --http-port=9358
env:
- name: GAMESERVER_NAME
value: preferred-w4gnw-dkjjx-5s6bs
- name: POD_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
- name: FEATURE_GATES
value: CountsAndLists=true&DisableResyncOnSDKServer=false&Example=true&FleetAllocationOverflow=false&GKEAutopilotExtendedDurationPods=true&PlayerAllocationFilter=true&PlayerTracking=true
- name: LOG_LEVEL
value: Info
image: us-docker.pkg.dev/agones-images/ci/agones-sdk:1.40.0-dev-5c2e964
imagePullPolicy: Always
livenessProbe:
failureThreshold: 3
httpGet:
path: /healthz
port: 8080
scheme: HTTP
initialDelaySeconds: 3
periodSeconds: 3
successThreshold: 1
timeoutSeconds: 1
name: agones-gameserver-sidecar
resources:
limits:
ephemeral-storage: 1Gi
requests:
cpu: 30m
ephemeral-storage: 1Gi
memory: 2Gi
securityContext:
capabilities:
drop:
- NET_RAW
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-8jf5m
readOnly: true
- env:
- name: AGONES_SDK_GRPC_PORT
value: "9357"
- name: AGONES_SDK_HTTP_PORT
value: "9358"
image: us-docker.pkg.dev/agones-images/examples/simple-game-server:0.28
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /gshealthz
port: 8080
scheme: HTTP
initialDelaySeconds: 5
periodSeconds: 5
successThreshold: 1
timeoutSeconds: 1
name: game-server
ports:
- containerPort: 7654
hostPort: 7240
protocol: UDP
resources:
limits:
cpu: 470m
ephemeral-storage: 1Gi
memory: 62Mi
requests:
cpu: 470m
ephemeral-storage: 1Gi
memory: 62Mi
securityContext:
capabilities:
drop:
- NET_RAW
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: empty
readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
hostname: preferred-w4gnw-dkjjx-5s6bs
nodeName: gk3-gke-autopilot-e2e-te-nap-qpzyqh5z-032f3139-lcj6
nodeSelector:
cloud.google.com/extended-duration-pods: "0"
preemptionPolicy: PreemptLowerPriority
priority: 0
restartPolicy: Always
schedulerName: gke.io/optimize-utilization-scheduler
securityContext:
seccompProfile:
type: Unconfined
serviceAccount: agones-sdk
serviceAccountName: agones-sdk
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
- effect: NoSchedule
key: kubernetes.io/arch
operator: Equal
value: amd64
volumes:
- emptyDir: {}
name: empty
- name: kube-api-access-8jf5m
projected:
defaultMode: 420
sources:
- serviceAccountToken:
expirationSeconds: 3607
path: token
- configMap:
items:
- key: ca.crt
path: ca.crt
name: kube-root-ca.crt
- downwardAPI:
items:
- fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
path: namespace
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2024-03-27T23:10:10Z"
status: "False"
type: PodReadyToStartContainers
- lastProbeTime: null
lastTransitionTime: "2024-03-27T22:57:02Z"
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: "2024-03-27T23:10:10Z"
message: 'containers with unready status: [agones-gameserver-sidecar game-server]'
reason: ContainersNotReady
status: "False"
type: Ready
- lastProbeTime: null
lastTransitionTime: "2024-03-27T23:10:10Z"
message: 'containers with unready status: [agones-gameserver-sidecar game-server]'
reason: ContainersNotReady
status: "False"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: "2024-03-27T22:57:02Z"
status: "True"
type: PodScheduled
containerStatuses:
- containerID: containerd://72f4fa8e6a90b882fe3be9eac808c0beed140feb526b22790f446fac46505c7a
image: us-docker.pkg.dev/agones-images/ci/agones-sdk:1.40.0-dev-5c2e964
imageID: us-docker.pkg.dev/agones-images/ci/agones-sdk@sha256:9aa78a749ccdf2b9b6c05484066e2d6f58e40843e87b218ec17cade8c04e267d
lastState: {}
name: agones-gameserver-sidecar
ready: false
restartCount: 0
started: false
state:
terminated:
containerID: containerd://72f4fa8e6a90b882fe3be9eac808c0beed140feb526b22790f446fac46505c7a
exitCode: 137
finishedAt: "2024-03-27T23:06:09Z"
reason: Error
startedAt: "2024-03-27T22:57:12Z"
- containerID: containerd://f39362db1ec461c97ae03e1ca5d20df5d946c107d425ee1dd20bdebac57ffc78
image: us-docker.pkg.dev/agones-images/examples/simple-game-server:0.28
imageID: us-docker.pkg.dev/agones-images/examples/simple-game-server@sha256:81d04419aa309fdb553a13be0614540da64a7df2ff9701f38a35c68398fe92e8
lastState: {}
name: game-server
ready: false
restartCount: 0
started: false
state:
terminated:
containerID: containerd://f39362db1ec461c97ae03e1ca5d20df5d946c107d425ee1dd20bdebac57ffc78
exitCode: 0
finishedAt: "2024-03-27T23:05:39Z"
reason: Completed
startedAt: "2024-03-27T22:57:17Z"
hostIP: 10.132.15.220
hostIPs:
- ip: 10.132.15.220
phase: Running
podIP: 10.64.134.69
podIPs:
- ip: 10.64.134.69
qosClass: Burstable
startTime: "2024-03-27T22:57:02Z"
So the pods are stuck for some reason?
kubectl delete pod --all --force --grace-period=0 -n <namespace>
did work to cleanup the Pods though.
I did recently notice:
markmandel@cloudshell:~ (agones-images)$ kubectl get crds | grep agones
fleetautoscalers.autoscaling.agones.dev 2024-04-03T07:47:43Z
fleets.agones.dev 2024-04-03T07:47:42Z
gameserverallocationpolicies.multicluster.agones.dev 2024-04-03T07:47:43Z
gameserversets.agones.dev 2024-04-03T07:47:44Z
But that seems to be after cleanup happened? GameServers
where definitely there before.
Can CRDs go into a Terminating state?
Tracking this in an internal bug, will report back.
Resolved. Will close.
Seems to be pretty consistent on these tests:
Also
Retry might be working in some cases, but this panic is definitely stopping retries -- and 30m is a long time for a 5 GameServer fleet not to spin up. It's very odd.
Some links to failed builds: