googleforgames / agones

Dedicated Game Server Hosting and Scaling for Multiplayer Games on Kubernetes
https://agones.dev
Apache License 2.0
6.08k stars 807 forks source link

Gameservers are always scheduled to one node even the `scheduling: Distributed` is set #2218

Closed renewboy closed 11 months ago

renewboy commented 3 years ago

What happened: My cluster has two nodes and both are schedulable(the master node is tainted), but the Gameservers are always scheduled to one node even the scheduling: Distributed is set.

What you expected to happen: Gameservers are evenly distributed.

How to reproduce it (as minimally and precisely as possible): use agones/examples/fleet.yaml to create fleets with replcas=20, and all Gameservers are scheduled to one node

Anything else we need to know?: As a workaround, I use topologySpreadConstraints to constrain the distribution, the mathLabels is set to agones.dev/role: gameserver.

Environment:

markmandel commented 3 years ago

I just attempted this on GKE, on the supported version of Kubernetes, on GKE and couldn't replicate things.

root@bf1cb3c68f4b:/go/src/agones.dev/agones# kubectl version
Client Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.9", GitCommit:"7a576bc3935a6b555e33346fd73ad77c925e9e4a", GitTreeState:"clean", BuildDate:"2021-07-15T21:01:38Z", GoVersion:"go1.15.14", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"20+", GitVersion:"v1.20.8-gke.900", GitCommit:"28ab8501be88ea42e897ca8514d7cd0b436253d9", GitTreeState:"clean", BuildDate:"2021-06-30T09:23:36Z", GoVersion:"go1.15.13b5", Compiler:"gc", Platform:"linux/amd64"}

Fleet specification: https://github.com/googleforgames/agones/blob/main/examples/simple-game-server/fleet-distributed.yaml

root@bf1cb3c68f4b:/go/src/agones.dev/agones# kubectl get gs
NAME                                         STATE   ADDRESS          PORT   NODE                                     AGE
simple-game-server-distributed-rqgth-8g9fs   Ready   34.105.117.127   7151   gke-test-cluster-default-9f93287d-fj4z   29s
simple-game-server-distributed-rqgth-dd2j7   Ready   34.105.117.127   7114   gke-test-cluster-default-9f93287d-fj4z   29s
simple-game-server-distributed-rqgth-dpk6v   Ready   35.247.118.127   7177   gke-test-cluster-default-e14a49f2-f2bz   29s
simple-game-server-distributed-rqgth-fswln   Ready   34.127.101.239   7233   gke-test-cluster-default-9f93287d-4kf2   29s
simple-game-server-distributed-rqgth-kw2zp   Ready   35.197.30.237    7311   gke-test-cluster-default-1bb0f716-6v0p   29s
simple-game-server-distributed-rqgth-lsbx5   Ready   35.197.72.243    7250   gke-test-cluster-default-1bb0f716-vgcc   29s
simple-game-server-distributed-rqgth-njrbn   Ready   35.197.30.237    7938   gke-test-cluster-default-1bb0f716-6v0p   29s
simple-game-server-distributed-rqgth-sddcl   Ready   35.197.72.243    7387   gke-test-cluster-default-1bb0f716-vgcc   29s
simple-game-server-distributed-rqgth-tqlsb   Ready   34.83.66.5       7108   gke-test-cluster-default-1bb0f716-r76x   29s
simple-game-server-distributed-rqgth-wkrjb   Ready   34.83.245.89     7881   gke-test-cluster-default-9f93287d-z0tb   29s

Is it possible you have customised your default scheduler on the cluster somehow?

Can you also provide us with your Fleet specification?

markmandel commented 3 years ago

Actually, I lied 🤦🏻 - I used 1.20.0 - (we're upgrading in the next release) - but I expect 1.19 should produce the same effect.

renewboy commented 3 years ago

@markmandel, I use kubeadm and have not customized the scheduler.

Here is my Fleet spec.

# Copyright 2020 Google LLC All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

apiVersion: "agones.dev/v1"
kind: Fleet
metadata:
  name: my-server
spec:
  replicas: 20
  scheduling: Distributed
  template:
    spec:
      sdkServer:
        logLevel: Info
        grpcPort: 9357
        httpPort: 9358
      players:
        # set this GameServer's initial player capacity
        initialCapacity: 6
      ports:
      - name: default
        portPolicy: Passthrough
        protocol: TCP
      - name: agnent1
        portPolicy: Passthrough
        protocol: TCP
      - name: agnent2
        portPolicy: Passthrough
        protocol: TCP
      - name: agnent3
        portPolicy: Passthrough
        protocol: TCP
      - name: agnent4
        portPolicy: Passthrough
        protocol: TCP
      - name: agnent5
        portPolicy: Passthrough
        protocol: TCP
      - name: agnent6
        portPolicy: Passthrough
        protocol: TCP
      template:
        spec:
          containers:
          - name: simple-game-server
            image: my_custom_image
            resources:
              requests:
                memory: "64Mi"
                #cpu: "20m"
              limits:
                memory: "128Mi"
                #cpu: "20m"
            volumeMounts:
              - name: fleet-config
                mountPath: /tmp/fleet-config
          volumes:
          - name: fleet-config
            configMap: 
              name: fleet-config

Hope this helps, thanks~

markmandel commented 3 years ago

I don't see scheduling: Distributed anywhere?

To note, 1.21 isn't supported by Agones yet - so I don't know if there are scheduler changes in that release.

renewboy commented 3 years ago

I don't see scheduling: Distributed anywhere?

To note, 1.21 isn't supported by Agones yet - so I don't know if there are scheduler changes in that release.

Sorry for my mistake, I updated the YAML. I have removed it as it was not working in my environment. I will try this on 1.19, thanks.

roberthbailey commented 3 years ago

@zouyxdut - did you manage to reproduce the error?

github-actions[bot] commented 1 year ago

'This issue is marked as Stale due to inactivity for more than 30 days. To avoid being marked as 'stale' please add 'awaiting-maintainer' label or add a comment. Thank you for your contributions '

github-actions[bot] commented 11 months ago

This issue is marked as obsolete due to inactivity for last 60 days. To avoid issue getting closed in next 30 days, please add a comment or add 'awaiting-maintainer' label. Thank you for your contributions