googleforgames / agones

Dedicated Game Server Hosting and Scaling for Multiplayer Games on Kubernetes
https://agones.dev
Apache License 2.0
6.11k stars 817 forks source link

Game servers are having some delays until getting external IPs from agones SDK #3960

Open ytkang opened 3 months ago

ytkang commented 3 months ago

What happened: Hi, recently I tried to update agones version from 1.38.0 to 1.42.0 But I found one issue, Normally my game server gets external ip address from the gameServer, err := sdk.GameServer() object. In 1.38.0 version, this gameServer object had this IP address in gameServer.Status.Address array right after I got the gameServer object. But In 1.42.0 version, gameServer.Status.Address doesn't have any addresses and has empty array at the moment server is up, so I should poll this information by calling sdk.GameServer() with some interval. after a few mintues, it finally got the address array.

What you expected to happen: IP address should be exists in gameServer.Status.Address array right after the sdk initialized

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

Here are the Game server pod and agones game server yamls Before I got IP address

#### game server pod yaml ####
apiVersion: v1
kind: Pod
metadata:
  annotations:
    agones.dev/container: battle-server
    agones.dev/sdk-version: 1.42.0
    cluster-autoscaler.kubernetes.io/safe-to-evict: "false"
  creationTimestamp: "2024-08-21T00:12:47Z"
  labels:
    agones.dev/gameserver: battle-server-fleet-nttcl-gtt66
    agones.dev/role: gameserver
    agones.dev/safe-to-evict: "false"
  name: battle-server-fleet-nttcl-gtt66
  namespace: default
  ownerReferences:
  - apiVersion: agones.dev/v1
    blockOwnerDeletion: true
    controller: true
    kind: GameServer
    name: battle-server-fleet-nttcl-gtt66
    uid: d21aedb3-ccbc-47fb-b356-801326ba7b24
  resourceVersion: "90054055"
  uid: 3192e5fe-6ba3-41c4-a6c0-c05c0590cde1
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: servicetype
            operator: In
            values:
            - battle
    podAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - podAffinityTerm:
          labelSelector:
            matchLabels:
              agones.dev/role: gameserver
          topologyKey: kubernetes.io/hostname
        weight: 100
  containers:
  - args:
    - --grpc-port=9357
    - --http-port=9358
    env:
    - name: GAMESERVER_NAME
      value: battle-server-fleet-nttcl-gtt66
    - name: POD_NAMESPACE
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: metadata.namespace
    - name: FEATURE_GATES
      value: AutopilotPassthroughPort=false&CountsAndLists=true&DisableResyncOnSDKServer=true&Example=false&GKEAutopilotExtendedDurationPods=false&PlayerAllocationFilter=false&PlayerTracking=false&PortPolicyNone=false&PortRanges=false&RollingUpdateFix=false&ScheduledAutoscaler=false
    - name: LOG_LEVEL
      value: Info
    image: us-docker.pkg.dev/agones-images/release/agones-sdk:1.42.0
    imagePullPolicy: IfNotPresent
    livenessProbe:
      failureThreshold: 3
      httpGet:
        path: /healthz
        port: 8080
        scheme: HTTP
      initialDelaySeconds: 3
      periodSeconds: 3
      successThreshold: 1
      timeoutSeconds: 1
    name: agones-gameserver-sidecar
    resources:
      requests:
        cpu: 30m
    securityContext:
      allowPrivilegeEscalation: false
      runAsNonRoot: true
      runAsUser: 1000
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-ckb9v
      readOnly: true
  - env:
    - name: UDP
      value: "FALSE"
    - name: TCP
      value: "TRUE"
    - name: AGONES_SDK_GRPC_PORT
      value: "9357"
    - name: AGONES_SDK_HTTP_PORT
      value: "9358"
    envFrom:
    - configMapRef:
        name: battle-server-configs
    image: -----.dkr.ecr.us-west-2.amazonaws.com/battle-server:agones-1.42.0
    imagePullPolicy: Always
    livenessProbe:
      failureThreshold: 3
      httpGet:
        path: /gshealthz
        port: 8080
        scheme: HTTP
      initialDelaySeconds: 5
      periodSeconds: 5
      successThreshold: 1
      timeoutSeconds: 1
    name: battle-server
    ports:
    - containerPort: 8037
      hostPort: 7080
      protocol: TCP
    - containerPort: 8034
      hostPort: 7283
      protocol: TCP
    resources:
      limits:
        cpu: 500m
        memory: 500Mi
      requests:
        cpu: 200m
        memory: 256Mi
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /etc/myapp/cert
      name: dev-secrets-volume
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: empty
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  hostname: battle-server-fleet-nttcl-gtt66
  nodeName: ip-172-16-27-227.us-west-2.compute.internal
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: agones-sdk
  serviceAccountName: agones-sdk
  terminationGracePeriodSeconds: 300
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - name: dev-secrets-volume
    secret:
      defaultMode: 420
      secretName: tls-secret
  - emptyDir: {}
    name: empty
  - name: kube-api-access-ckb9v
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2024-08-21T00:12:47Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2024-08-21T00:12:48Z"
    status: "True"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2024-08-21T00:12:48Z"
    status: "True"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2024-08-21T00:12:47Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: containerd://406676bcea13c77431060b3dab4215c2896d065aea634ea7fb1b3cd681973ffe
    image: us-docker.pkg.dev/agones-images/release/agones-sdk:1.42.0
    imageID: us-docker.pkg.dev/agones-images/release/agones-sdk@sha256:eebd165dd0c2696f260306b99b79d65bd6253095076b6a9b91a8beadfaee1307
    lastState: {}
    name: agones-gameserver-sidecar
    ready: true
    restartCount: 0
    started: true
    state:
      running:
        startedAt: "2024-08-21T00:12:48Z"
  - containerID: containerd://9370fcd3802c00171cf1b3250260bcfe7c553595db782ff6e21dddbff81021d6
    image: -----.dkr.ecr.us-west-2.amazonaws.com/battle-server:agones-1.42.0
    imageID: -----.dkr.ecr.us-west-2.amazonaws.com/battle-server@sha256:3efc2306f2feacabc23c5f67550bbede8608fb8c4bd981edd5f297c4347377ea
    lastState: {}
    name: battle-server
    ready: true
    restartCount: 0
    started: true
    state:
      running:
        startedAt: "2024-08-21T00:12:48Z"
  hostIP: 172.16.27.227
  phase: Running
  podIP: 172.16.26.76
  podIPs:
  - ip: 172.16.26.76
  qosClass: Burstable
  startTime: "2024-08-21T00:12:47Z"

#### game server yaml ####
apiVersion: agones.dev/v1
kind: GameServer
metadata:
  annotations:
    agones.dev/sdk-version: 1.42.0
  creationTimestamp: "2024-08-21T00:12:47Z"
  finalizers:
  - agones.dev/controller
  generateName: battle-server-fleet-nttcl-
  generation: 4
  labels:
    agones.dev/fleet: battle-server-fleet
    agones.dev/gameserverset: battle-server-fleet-nttcl
    app: battle-server
  name: battle-server-fleet-nttcl-gtt66
  namespace: default
  ownerReferences:
  - apiVersion: agones.dev/v1
    blockOwnerDeletion: true
    controller: true
    kind: GameServerSet
    name: battle-server-fleet-nttcl
    uid: f57889d5-1ca0-4aef-a016-d4f096d67bbf
  resourceVersion: "90054049"
  uid: d21aedb3-ccbc-47fb-b356-801326ba7b24
spec:
  container: battle-server
  counters:
    players:
      capacity: 160
      count: 1
  eviction:
    safe: Never
  health:
    failureThreshold: 3
    initialDelaySeconds: 5
    periodSeconds: 5
  immutableReplicas: 1
  ports:
  - container: battle-server
    containerPort: 8037
    hostPort: 7080
    name: app-port
    portPolicy: Dynamic
    protocol: TCP
    range: default
  - container: battle-server
    containerPort: 8034
    hostPort: 7283
    name: web-port
    portPolicy: Dynamic
    protocol: TCP
    range: default
  scheduling: Packed
  sdkServer:
    grpcPort: 9357
    httpPort: 9358
    logLevel: Info
  template:
    metadata:
      creationTimestamp: null
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: servicetype
                operator: In
                values:
                - battle
      containers:
      - env:
        - name: UDP
          value: "FALSE"
        - name: TCP
          value: "TRUE"
        envFrom:
        - configMapRef:
            name: battle-server-configs
        image: -----.dkr.ecr.us-west-2.amazonaws.com/battle-server:agones-1.42.0
        imagePullPolicy: Always
        name: battle-server
        resources:
          limits:
            cpu: 500m
            memory: 500Mi
          requests:
            cpu: 200m
            memory: 256Mi
        volumeMounts:
        - mountPath: /etc/myapp/cert
          name: dev-secrets-volume
      terminationGracePeriodSeconds: 300
      volumes:
      - name: dev-secrets-volume
        secret:
          secretName: tls-secret
status:
  address: ""
  addresses: null
  counters:
    players:
      capacity: 160
      count: 1
  eviction:
    safe: Never
  immutableReplicas: 1
  nodeName: ""
  players: null
  ports: null
  reservedUntil: "2024-08-21T00:13:48Z"
  state: Reserved

After I got IP address

#### game server pod yaml ####
apiVersion: v1
kind: Pod
metadata:
  annotations:
    agones.dev/container: battle-server
    agones.dev/ready-container-id: containerd://9370fcd3802c00171cf1b3250260bcfe7c553595db782ff6e21dddbff81021d6
    agones.dev/sdk-version: 1.42.0
    cluster-autoscaler.kubernetes.io/safe-to-evict: "false"
  creationTimestamp: "2024-08-21T00:12:47Z"
  labels:
    agones.dev/gameserver: battle-server-fleet-nttcl-gtt66
    agones.dev/role: gameserver
    agones.dev/safe-to-evict: "false"
  name: battle-server-fleet-nttcl-gtt66
  namespace: default
  ownerReferences:
  - apiVersion: agones.dev/v1
    blockOwnerDeletion: true
    controller: true
    kind: GameServer
    name: battle-server-fleet-nttcl-gtt66
    uid: d21aedb3-ccbc-47fb-b356-801326ba7b24
  resourceVersion: "90055102"
  uid: 3192e5fe-6ba3-41c4-a6c0-c05c0590cde1
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: servicetype
            operator: In
            values:
            - battle
    podAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - podAffinityTerm:
          labelSelector:
            matchLabels:
              agones.dev/role: gameserver
          topologyKey: kubernetes.io/hostname
        weight: 100
  containers:
  - args:
    - --grpc-port=9357
    - --http-port=9358
    env:
    - name: GAMESERVER_NAME
      value: battle-server-fleet-nttcl-gtt66
    - name: POD_NAMESPACE
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: metadata.namespace
    - name: FEATURE_GATES
      value: AutopilotPassthroughPort=false&CountsAndLists=true&DisableResyncOnSDKServer=true&Example=false&GKEAutopilotExtendedDurationPods=false&PlayerAllocationFilter=false&PlayerTracking=false&PortPolicyNone=false&PortRanges=false&RollingUpdateFix=false&ScheduledAutoscaler=false
    - name: LOG_LEVEL
      value: Info
    image: us-docker.pkg.dev/agones-images/release/agones-sdk:1.42.0
    imagePullPolicy: IfNotPresent
    livenessProbe:
      failureThreshold: 3
      httpGet:
        path: /healthz
        port: 8080
        scheme: HTTP
      initialDelaySeconds: 3
      periodSeconds: 3
      successThreshold: 1
      timeoutSeconds: 1
    name: agones-gameserver-sidecar
    resources:
      requests:
        cpu: 30m
    securityContext:
      allowPrivilegeEscalation: false
      runAsNonRoot: true
      runAsUser: 1000
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-ckb9v
      readOnly: true
  - env:
    - name: UDP
      value: "FALSE"
    - name: TCP
      value: "TRUE"
    - name: AGONES_SDK_GRPC_PORT
      value: "9357"
    - name: AGONES_SDK_HTTP_PORT
      value: "9358"
    envFrom:
    - configMapRef:
        name: battle-server-configs
    image: -----.dkr.ecr.us-west-2.amazonaws.com/battle-server:agones-1.42.0
    imagePullPolicy: Always
    livenessProbe:
      failureThreshold: 3
      httpGet:
        path: /gshealthz
        port: 8080
        scheme: HTTP
      initialDelaySeconds: 5
      periodSeconds: 5
      successThreshold: 1
      timeoutSeconds: 1
    name: battle-server
    ports:
    - containerPort: 8037
      hostPort: 7080
      protocol: TCP
    - containerPort: 8034
      hostPort: 7283
      protocol: TCP
    resources:
      limits:
        cpu: 500m
        memory: 500Mi
      requests:
        cpu: 200m
        memory: 256Mi
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /etc/myapp/cert
      name: dev-secrets-volume
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: empty
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  hostname: battle-server-fleet-nttcl-gtt66
  nodeName: ip-172-16-27-227.us-west-2.compute.internal
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: agones-sdk
  serviceAccountName: agones-sdk
  terminationGracePeriodSeconds: 300
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - name: dev-secrets-volume
    secret:
      defaultMode: 420
      secretName: tls-secret
  - emptyDir: {}
    name: empty
  - name: kube-api-access-ckb9v
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2024-08-21T00:12:47Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2024-08-21T00:12:48Z"
    status: "True"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2024-08-21T00:12:48Z"
    status: "True"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2024-08-21T00:12:47Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: containerd://406676bcea13c77431060b3dab4215c2896d065aea634ea7fb1b3cd681973ffe
    image: us-docker.pkg.dev/agones-images/release/agones-sdk:1.42.0
    imageID: us-docker.pkg.dev/agones-images/release/agones-sdk@sha256:eebd165dd0c2696f260306b99b79d65bd6253095076b6a9b91a8beadfaee1307
    lastState: {}
    name: agones-gameserver-sidecar
    ready: true
    restartCount: 0
    started: true
    state:
      running:
        startedAt: "2024-08-21T00:12:48Z"
  - containerID: containerd://9370fcd3802c00171cf1b3250260bcfe7c553595db782ff6e21dddbff81021d6
    image: -----.dkr.ecr.us-west-2.amazonaws.com/battle-server:agones-1.42.0
    imageID: -----.dkr.ecr.us-west-2.amazonaws.com/battle-server@sha256:3efc2306f2feacabc23c5f67550bbede8608fb8c4bd981edd5f297c4347377ea
    lastState: {}
    name: battle-server
    ready: true
    restartCount: 0
    started: true
    state:
      running:
        startedAt: "2024-08-21T00:12:48Z"
  hostIP: 172.16.27.227
  phase: Running
  podIP: 172.16.26.76
  podIPs:
  - ip: 172.16.26.76
  qosClass: Burstable
  startTime: "2024-08-21T00:12:47Z"

#### game server yaml ####
apiVersion: agones.dev/v1
kind: GameServer
metadata:
  annotations:
    agones.dev/ready-container-id: containerd://9370fcd3802c00171cf1b3250260bcfe7c553595db782ff6e21dddbff81021d6
    agones.dev/sdk-version: 1.42.0
  creationTimestamp: "2024-08-21T00:12:47Z"
  finalizers:
  - agones.dev/controller
  generateName: battle-server-fleet-nttcl-
  generation: 8
  labels:
    agones.dev/fleet: battle-server-fleet
    agones.dev/gameserverset: battle-server-fleet-nttcl
    app: battle-server
  name: battle-server-fleet-nttcl-gtt66
  namespace: default
  ownerReferences:
  - apiVersion: agones.dev/v1
    blockOwnerDeletion: true
    controller: true
    kind: GameServerSet
    name: battle-server-fleet-nttcl
    uid: f57889d5-1ca0-4aef-a016-d4f096d67bbf
  resourceVersion: "90055143"
  uid: d21aedb3-ccbc-47fb-b356-801326ba7b24
spec:
  container: battle-server
  counters:
    players:
      capacity: 160
      count: 1
  eviction:
    safe: Never
  health:
    failureThreshold: 3
    initialDelaySeconds: 5
    periodSeconds: 5
  immutableReplicas: 1
  ports:
  - container: battle-server
    containerPort: 8037
    hostPort: 7080
    name: app-port
    portPolicy: Dynamic
    protocol: TCP
    range: default
  - container: battle-server
    containerPort: 8034
    hostPort: 7283
    name: web-port
    portPolicy: Dynamic
    protocol: TCP
    range: default
  scheduling: Packed
  sdkServer:
    grpcPort: 9357
    httpPort: 9358
    logLevel: Info
  template:
    metadata:
      creationTimestamp: null
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: servicetype
                operator: In
                values:
                - battle
      containers:
      - env:
        - name: UDP
          value: "FALSE"
        - name: TCP
          value: "TRUE"
        envFrom:
        - configMapRef:
            name: battle-server-configs
        image: -----.dkr.ecr.us-west-2.amazonaws.com/battle-server:agones-1.42.0
        imagePullPolicy: Always
        name: battle-server
        resources:
          limits:
            cpu: 500m
            memory: 500Mi
          requests:
            cpu: 200m
            memory: 256Mi
        volumeMounts:
        - mountPath: /etc/myapp/cert
          name: dev-secrets-volume
      terminationGracePeriodSeconds: 300
      volumes:
      - name: dev-secrets-volume
        secret:
          secretName: tls-secret
status:
  address: ec2-39-204-210-11.us-west-2.compute.amazonaws.com
  addresses:
  - address: 172.16.27.227
    type: InternalIP
  - address: 39.204.210.11
    type: ExternalIP
  - address: ip-172-16-27-227.us-west-2.compute.internal
    type: InternalDNS
  - address: ip-172-16-27-227.us-west-2.compute.internal
    type: Hostname
  - address: ec2-39-204-210-11.us-west-2.compute.amazonaws.com
    type: ExternalDNS
  - address: 172.16.26.76
    type: PodIP
  counters:
    players:
      capacity: 160
      count: 1
  eviction:
    safe: Never
  immutableReplicas: 1
  nodeName: ip-172-16-27-227.us-west-2.compute.internal
  players: null
  ports:
  - name: app-port
    port: 7080
  - name: web-port
    port: 7283
  reservedUntil: null
  state: Ready
markmandel commented 1 month ago

What I expect we should do it populate address immediately with Node details (since we have that), but if we don't have the PodIP yet, move forwards, and only populate it when we do.