itzg / mc-router

Routes Minecraft client connections to backend servers based upon the requested server address
MIT License
534 stars 36 forks source link

mc-router Fails to Route Requests and Auto-Scale Minecraft Server in kubernetes #311

Closed csabca83 closed 1 month ago

csabca83 commented 1 month ago

Description

Describe the bug

I am facing an issue with mc-router where requests routed through the mc-router service fail to reach the backend Minecraft server. Additionally, the auto-scaling functionality does not seem to be triggered, and clients trying to connect to the Minecraft server time out.

To Reproduce

Steps to reproduce the behavior:

Deploy mc-router and Minecraft server using the provided Kubernetes manifests. Expose mc-router on a NodePort for HTTP (30013) and TCP (30012) for Minecraft. Attempt to connect to the Minecraft server via the mc-router service on port 30013.

Expected behavior

The mc-router should route the Minecraft traffic to the backend Minecraft server, and auto-scaling should be triggered if necessary.

Environment:

Kubernetes Version: [v1.30.2] mc-router Version: 1.21.0 mcserver Version: 2024.7.0

Logs:

Whenever I try to reach out through 30013 there are no logs, I can only see a timeout exception on the client side. However if I try to connect directly through 30012 which is the direct port to the mcserver via the mc-router it works if the mcserver scale is at 1. However if the scale is 0 it won't scale it, just return an error saying that the backend server isn't reachable. Either one would be fine for me as long as 0->1 scaling would happen.

apiVersion: v1
kind: ServiceAccount
metadata:
  name: mc-router
  namespace: mcserver
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: mc-router-services-watcher-crole
rules:
- apiGroups: [""]
  resources: ["services"]
  verbs: ["watch", "list"]
- apiGroups: ["apps"]
  resources: ["statefulsets", "statefulsets/scale"]
  verbs: ["watch", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: mc-router-services-watcher
  namespace: mcserver
subjects:
- kind: ServiceAccount
  name: mc-router
  namespace: mcserver
roleRef:
  kind: ClusterRole
  name: mc-router-services-watcher-crole
  apiGroup: rbac.authorization.k8s.io
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: mc-router-stsscaler-role
  namespace: mcserver
rules:
- apiGroups: ["apps"]
  resources: ["statefulsets/scale"]
  verbs: ["get", "update"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: mc-router-stsscaler-role
  namespace: mcserver
subjects:
- kind: ServiceAccount
  name: mc-router
  namespace: mcserver
roleRef:
  kind: Role
  name: mc-router-stsscaler-role
  apiGroup: rbac.authorization.k8s.io
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: mc-router
  name: mc-router
  namespace: mcserver
spec:
  selector:
    matchLabels:
      app: mc-router
  strategy:
    type: Recreate
  template:
    metadata:
      namespace: mcserver
      labels:
        app: mc-router
    spec:
      serviceAccountName: mc-router
      containers:
      - image: itzg/mc-router:1.21.0
        imagePullPolicy: Always
        name: mc-router
        args: ["--api-binding", ":8080", "--in-kube-cluster", "--auto-scale-up", "--debug"]
        env:
        - name: AUTO_SCALE_UP
          value: "true"
        ports:
        - name: proxy
          containerPort: 25565
        - name: web
          containerPort: 8080
        resources:
          requests:
            memory: 50Mi
            cpu: "100m"
          limits:
            memory: 100Mi
            cpu: "250m"
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: mcserver-statefulset
  namespace: mcserver
spec:
  serviceName: mcserver-service
  replicas: 1
  selector:
    matchLabels:
      app: mcserver
  template:
    metadata:
      labels:
        app: mcserver
    spec:
      restartPolicy: Always
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: kubernetes.io/hostname
                operator: In
                values:
                - server
      containers:
      - name: mcserver
        image: itzg/minecraft-server:2024.7.0@sha256:ea3cdf39d2d7616946fe6d6c90c4c9cca2aa8e6b354ab37e557a8b0eec01b679
        livenessProbe:
          tcpSocket:
            port: 25565
          initialDelaySeconds: 120
          periodSeconds: 60
          failureThreshold: 2
        readinessProbe:
          exec:
            command:
            - mc-monitor
            - status
            - --host
            - localhost
            - --port
            - "25565"
          initialDelaySeconds: 20
          periodSeconds: 5
          failureThreshold: 20
        startupProbe:
          tcpSocket:
            port: 25565
          failureThreshold: 30
          periodSeconds: 5
        env:
        - name: EULA
          value: "TRUE"
        - name: MEMORY
          value: ""
        - name: JVM_XX_OPTS
          value: "-XX:MaxRAMPercentage=90"
        - name: TZ
          value: "Europe/Budapest"
        - name: SERVER_NAME
          value: "Jager Kubernetes Minecraft"
        - name: PLAYER_IDLE_TIMEOUT
          value: "10"
        - name: MOTD
          value: "Happy Holidays"
        volumeMounts:
        - name: local-pv-medium-2-pvc
          mountPath: /data
        ports:
        - containerPort: 25565
        resources:
          limits:
            cpu: "2500m"
            memory: "2000Mi"
          requests:
            cpu: "2000m"
            memory: "1500Mi"
  volumeClaimTemplates:
  - metadata:
      name: local-pv-medium-2-pvc
      namespace: mcserver
    spec:
      accessModes: ["ReadWriteOnce"]
      storageClassName: local-storage
      resources:
        requests:
          storage: 20Gi
---
apiVersion: v1
kind: Service
metadata:
  name: mc-router
spec:
  type: NodePort
  ports:
  - targetPort: web
    name: web
    port: 8080
    nodePort: 30013
  - protocol: TCP
    targetPort: proxy
    name: proxy
    port: 25565
    nodePort: 30012
  selector:
    app: mc-router
---
apiVersion: v1
kind: Service
metadata:
  name: minecraft
  namespace: mcserver
  labels:
    app: mcserver
  annotations:
    "mc-router.itzg.me/defaultServer": "true"
spec:
  #clusterIP: None
  ports:
  - protocol: TCP
    name: minecraft-server-port
    port: 25565
    targetPort: 25565
  type: ClusterIP
  selector:
    app: mcserver

I was trying to look for different alternatives, but this one is still the easiest based on the docs it has. It could be something that I'm just messing up.... Any help would be highly appreciated.

itzg commented 1 month ago

Expose mc-router on a NodePort for HTTP (30013) and TCP (30012) for Minecraft. Attempt to connect to the Minecraft server via the mc-router service on port 30013

You're attempting the connection to HTTP (30013) and not Minecraft port (30012). You probably don't even need to expose HTTP.

csabca83 commented 1 month ago

Expose mc-router on a NodePort for HTTP (30013) and TCP (30012) for Minecraft. Attempt to connect to the Minecraft server via the mc-router service on port 30013

You're attempting the connection to HTTP (30013) and not Minecraft port (30012). You probably don't even need to expose HTTP.

Thanks for confirming this, I'll have that removed. Does this mean that scaling should happen even through 30012? (That's the main reason why I was playing with both ports, because I wasn't sure if sts scaling is available directly from 30012)

itzg commented 1 month ago

The HTTP API access isn't involved in auto-scaling.

That feature was contributed over 2 years ago https://github.com/itzg/mc-router/pull/29. Maybe it doesn't work anymore. If it works without it enabled, then just don't enable it.

csabca83 commented 1 month ago

The HTTP API access isn't involved in auto-scaling.

That feature was contributed over 2 years ago https://github.com/itzg/mc-router/pull/29. Maybe it doesn't work anymore. If it works without it enabled, then just don't enable it.

Well the thing is, autoscaling from 0->1 is the only thing that I need from this project 🙃. I can live my life happily without having additional hops (from ClusterIP services) and proxies, unless it's necessary for the scaling to work...

itzg commented 1 month ago

That's fair enough. There's other benefits of mc-router, but agreed that it adds extra hops if you don't need it.

With that said, I'll leave the issue open to possibly look at later. I just can't guarantee when or ever that I am able to look at it.

@vorburger perhaps you could help take a look?

csabca83 commented 1 month ago

@itzg I did some research regarding the issue and found out that StatefulSet.metadata.name and StatefulSet.spec.serviceName must be the same; otherwise, autoscaling won't work. Another important aspect I was missing is the annotation for the StatefulSet service. The annotation "mc-router.itzg.me/externalServerName" must be present. If you connect directly from an external IP address through a NodePort, the IP address of the server must be specified there.

apiVersion: v1
kind: Service
metadata:
  name: mcserver-statefulset
  namespace: mcserver
  labels:
    app: mcserver
  annotations:
    "mc-router.itzg.me/defaultServer": "true"
    "mc-router.itzg.me/externalServerName": "1.2.3.4"
spec:
  # clusterIP: None
  ports:
  - protocol: TCP
    name: minecraft
    port: 25565
    targetPort: 25565
  type: ClusterIP
  selector:
    app: mcserver

I tried using wildcards like "mc-router.itzg.me/externalServerName": "*", but that doesn't work. Having this work with only headless services and removing these strict requirements (or providing more options specifically for autoscaling to be configured as a flag) would be amazing. Additionally, more detailed logging at the debug level would be helpful.

At a minimum, I can update the documentation to bring awareness to these requirements, as it wasn't clear to me what was needed.

itzg commented 1 month ago

Great research! Yes, a PR for the docs improvement for now would be great.

The other changes sounds good too, but I would need to check where the code requires certain ones of those and/or which can be relaxed.

vorburger commented 1 month ago

@vorburger perhaps you could help take a look?

@itzg Thank You for remembering me, from back in https://github.com/itzg/mc-router/pull/29; hope you're well. -- I haven't been actively using this anymore in a while, and unfortunately don't currently have the bandwidth / free cycles to help out here. Best, M.