openkruise / kruise-game

Game Servers Management on Kubernetes
https://openkruise.io/kruisegame/introduction
Apache License 2.0
311 stars 42 forks source link

controller restart when GameServerSet redeploy to cluster #65

Closed alvin-7 closed 1 year ago

alvin-7 commented 1 year ago

Background

We encountered an issue while configuring Hostport network mode using OKG's GameServerSet.

First of all, the GameServerSet has been deployed in the cluster, and the pod has been started, and its status is Running. Then I deleted it by running "kubectl delete" on the GameServerSet. At this time, pod's status is Terminating. While the pod has not finished exiting, I re-applied this GameServerSet. However, the newly created pod failed to obtain the Hostport.

Later, I deleted this GameServerSet again, waited for the pod to completely exit, and then re-applied this GameServerSet. The pod was able to obtain the Hostport information correctly.

Upon investigation, I found that the Kruise-game controller panicked due to this operation, which caused a restart. I suspect that the issue was caused by this operation as a whole.

Deployment file

apiVersion: game.kruise.io/v1alpha1
kind: GameServerSet
metadata:
  name: trunk
spec:
  replicas: 1
  updateStrategy:
    rollingUpdate:
      podUpdatePolicy: InPlaceIfPossible
  network:
    networkType: Kubernetes-HostPort
    networkConf:
    - name: ContainerPorts 
      value: "container1:5000/TCP"
  gameServerTemplate:
    spec:
      containers:
        - image: container1-image
          imagePullPolicy: IfNotPresent
          name: container1
          env:
          - name: KRUISE_CONTAINER_PRIORITY
            value: "2"
          volumeMounts:
            - name: network
              mountPath: /opt/network
        - image: container2-image
          imagePullPolicy: IfNotPresent
          name: container2
          env:
          - name: KRUISE_CONTAINER_PRIORITY
            value: "1"
      volumes:
      - name: network
        downwardAPI:
          items:
          - path: "annotations"
            fieldRef:
              fieldPath: metadata.annotations['game.kruise.io/network-status']

    volumeClaimTemplates:
      - metadata:
          name: db-storage
        spec:
          accessModes: ["ReadWriteOnce"]
          storageClassName: "cfs"
          resources:
            requests:
              storage: 10Gi

Controller logs

1.686739246445438e+09   DEBUG   controller-runtime.webhook.webhooks received request    {"webhook": "/validate-v1alpha1-gss", "UID": "e5c1cf00-f3d3-46ae-b86a-9b5b8ad537a6", "kind": "game.kruise.io/v1alpha1, Kind=GameServerSet", "resource": {"group":"game.kruise.io","version":"v1alpha1","resource":"gameserversets"}}
1.686739246445752e+09   DEBUG   controller-runtime.webhook.webhooks wrote response  {"webhook": "/validate-v1alpha1-gss", "code": 200, "reason": "pass validating", "UID": "e5c1cf00-f3d3-46ae-b86a-9b5b8ad537a6", "allowed": true}
1.6867392465119815e+09  DEBUG   events  Normal  {"object": {"kind":"GameServerSet","namespace":"default","name":"a4","uid":"587e3fb3-cc3a-4b12-aa3b-6254ff0d7875","apiVersion":"game.kruise.io/v1alpha1","resourceVersion":"33261501457"}, "reason": "CreateWorkload", "message": "created Advanced StatefulSet"}
1.6867392465414512e+09  DEBUG   controller-runtime.webhook.webhooks received request    {"webhook": "/mutate-v1-pod", "UID": "a8cf9c48-0ab7-4366-978a-885777aee37f", "kind": "/v1, Kind=Pod", "resource": {"group":"","version":"v1","resource":"pods"}}
1.6867392465423882e+09  DEBUG   controller-runtime.webhook.webhooks wrote response  {"webhook": "/mutate-v1-pod", "code": 200, "reason": "", "UID": "a8cf9c48-0ab7-4366-978a-885777aee37f", "allowed": true}
1.6867392466240625e+09  DEBUG   controller-runtime.webhook.webhooks received request    {"webhook": "/mutate-v1-pod", "UID": "3d27baa4-34d7-4681-b3c3-42ac7bee3ea2", "kind": "/v1, Kind=Pod", "resource": {"group":"","version":"v1","resource":"pods"}}
1.6867392466250088e+09  DEBUG   controller-runtime.webhook.webhooks wrote response  {"webhook": "/mutate-v1-pod", "code": 200, "reason": "", "UID": "3d27baa4-34d7-4681-b3c3-42ac7bee3ea2", "allowed": true}
1.686739246684605e+09   DEBUG   controller-runtime.webhook.webhooks received request    {"webhook": "/mutate-v1-pod", "UID": "82bb9c68-6182-44ad-9874-d5845f78af91", "kind": "/v1, Kind=Pod", "resource": {"group":"","version":"v1","resource":"pods"}}
1.686739246685527e+09   DEBUG   controller-runtime.webhook.webhooks wrote response  {"webhook": "/mutate-v1-pod", "code": 200, "reason": "", "UID": "82bb9c68-6182-44ad-9874-d5845f78af91", "allowed": true}
1.6867392467381902e+09  DEBUG   controller-runtime.webhook.webhooks received request    {"webhook": "/mutate-v1-pod", "UID": "413f0e56-ed28-46b2-a6c6-b7b183adbec2", "kind": "/v1, Kind=Pod", "resource": {"group":"","version":"v1","resource":"pods"}}
1.686739246739107e+09   DEBUG   controller-runtime.webhook.webhooks wrote response  {"webhook": "/mutate-v1-pod", "code": 200, "reason": "", "UID": "413f0e56-ed28-46b2-a6c6-b7b183adbec2", "allowed": true}
1.686739246776154e+09   DEBUG   controller-runtime.webhook.webhooks received request    {"webhook": "/mutate-v1-pod", "UID": "2a41ab6a-97d3-4c68-847b-450fcebd1752", "kind": "/v1, Kind=Pod", "resource": {"group":"","version":"v1","resource":"pods"}}
1.6867392467770834e+09  DEBUG   controller-runtime.webhook.webhooks wrote response  {"webhook": "/mutate-v1-pod", "code": 200, "reason": "", "UID": "2a41ab6a-97d3-4c68-847b-450fcebd1752", "allowed": true}
1.6867392468166428e+09  DEBUG   controller-runtime.webhook.webhooks received request    {"webhook": "/mutate-v1-pod", "UID": "1da9233f-f440-4845-84b4-b807ea56279e", "kind": "/v1, Kind=Pod", "resource": {"group":"","version":"v1","resource":"pods"}}
1.6867392468175259e+09  DEBUG   controller-runtime.webhook.webhooks wrote response  {"webhook": "/mutate-v1-pod", "code": 200, "reason": "", "UID": "1da9233f-f440-4845-84b4-b807ea56279e", "allowed": true}
1.6867392470155354e+09  DEBUG   controller-runtime.webhook.webhooks received request    {"webhook": "/mutate-v1-pod", "UID": "a23451b2-dd25-4628-995f-56ca2470e466", "kind": "/v1, Kind=Pod", "resource": {"group":"","version":"v1","resource":"pods"}}
1.6867392470164003e+09  DEBUG   controller-runtime.webhook.webhooks wrote response  {"webhook": "/mutate-v1-pod", "code": 200, "reason": "", "UID": "a23451b2-dd25-4628-995f-56ca2470e466", "allowed": true}
1.6867392473775764e+09  DEBUG   controller-runtime.webhook.webhooks received request    {"webhook": "/mutate-v1-pod", "UID": "964ff6a1-5588-4aca-a1a5-1959576b29eb", "kind": "/v1, Kind=Pod", "resource": {"group":"","version":"v1","resource":"pods"}}
1.6867392473784983e+09  DEBUG   controller-runtime.webhook.webhooks wrote response  {"webhook": "/mutate-v1-pod", "code": 200, "reason": "", "UID": "964ff6a1-5588-4aca-a1a5-1959576b29eb", "allowed": true}
1.6867392481780772e+09  DEBUG   controller-runtime.webhook.webhooks received request    {"webhook": "/mutate-v1-pod", "UID": "54e1e9c6-5769-48e3-86ec-b06390cb19f8", "kind": "/v1, Kind=Pod", "resource": {"group":"","version":"v1","resource":"pods"}}
1.6867392481789427e+09  DEBUG   controller-runtime.webhook.webhooks wrote response  {"webhook": "/mutate-v1-pod", "code": 200, "reason": "", "UID": "54e1e9c6-5769-48e3-86ec-b06390cb19f8", "allowed": true}
1.6867392495441675e+09  DEBUG   controller-runtime.webhook.webhooks received request    {"webhook": "/mutate-v1-pod", "UID": "96ac2a82-7933-4894-8c27-0764d4069dd1", "kind": "/v1, Kind=Pod", "resource": {"group":"","version":"v1","resource":"pods"}}
1.6867392495450299e+09  DEBUG   controller-runtime.webhook.webhooks wrote response  {"webhook": "/mutate-v1-pod", "code": 200, "reason": "", "UID": "96ac2a82-7933-4894-8c27-0764d4069dd1", "allowed": true}
1.6867392521424189e+09  DEBUG   controller-runtime.webhook.webhooks received request    {"webhook": "/mutate-v1-pod", "UID": "0d92f93c-0e2e-4a55-ad51-e8425ce19de4", "kind": "/v1, Kind=Pod", "resource": {"group":"","version":"v1","resource":"pods"}}
1.6867392521432865e+09  DEBUG   controller-runtime.webhook.webhooks wrote response  {"webhook": "/mutate-v1-pod", "code": 200, "reason": "", "UID": "0d92f93c-0e2e-4a55-ad51-e8425ce19de4", "allowed": true}
1.68673925730318e+09    DEBUG   controller-runtime.webhook.webhooks received request    {"webhook": "/mutate-v1-pod", "UID": "16658c6b-f5e2-4976-b9c7-eb08e8d4bdb1", "kind": "/v1, Kind=Pod", "resource": {"group":"","version":"v1","resource":"pods"}}
1.6867392573040788e+09  DEBUG   controller-runtime.webhook.webhooks wrote response  {"webhook": "/mutate-v1-pod", "code": 200, "reason": "", "UID": "16658c6b-f5e2-4976-b9c7-eb08e8d4bdb1", "allowed": true}
1.6867392675877454e+09  DEBUG   controller-runtime.webhook.webhooks received request    {"webhook": "/mutate-v1-pod", "UID": "f7c72efe-8657-4122-97c4-7ebc8c530a3c", "kind": "/v1, Kind=Pod", "resource": {"group":"","version":"v1","resource":"pods"}}
1.6867392675886924e+09  DEBUG   controller-runtime.webhook.webhooks wrote response  {"webhook": "/mutate-v1-pod", "code": 200, "reason": "", "UID": "f7c72efe-8657-4122-97c4-7ebc8c530a3c", "allowed": true}
1.6867392744261086e+09  DEBUG   controller-runtime.webhook.webhooks received request    {"webhook": "/mutate-v1-pod", "UID": "dedc8191-1383-4fd4-a151-ada8b4a73caf", "kind": "/v1, Kind=Pod", "resource": {"group":"","version":"v1","resource":"pods"}}
panic: runtime error: index out of range [-1]

goroutine 1790 [running]:
github.com/openkruise/kruise-game/cloudprovider/kubernetes.(*HostPortPlugin).deAllocate(0xc0003cf8b0, {0xc0008025d0, 0x1, 0x1a1172e?}, {0xc00099acd0, 0xc})
    /workspace/cloudprovider/kubernetes/hostPort.go:251 +0x169
github.com/openkruise/kruise-game/cloudprovider/kubernetes.(*HostPortPlugin).OnPodDeleted(0xc0003cf8b0, {0x7000000000000?, 0xc00077a060?}, 0xc000700800, {0x0?, 0x0?})
    /workspace/cloudprovider/kubernetes/hostPort.go:175 +0x14a
github.com/openkruise/kruise-game/pkg/webhook.(*PodMutatingHandler).Handle.func1()
    /workspace/pkg/webhook/mutating_pod.go:81 +0xdf
created by github.com/openkruise/kruise-game/pkg/webhook.(*PodMutatingHandler).Handle
    /workspace/pkg/webhook/mutating_pod.go:72 +0x37d
chrisliu1995 commented 1 year ago

Here's the thing. HostPort Plugin allocate & deallocate hostPort for pod by webhook mechanism. When pod add event is created, plugin allocate, and when pod delete event is created, plugin deallocate.

However, there will be multiple delete events when pod is terminating, so that plugin has to use a map called isAllocated to record whether the pod's hostPort number has been allocated to avoid repeating deallocation.

The problem is that the key of map isAllocated is {pod namespace}/{pod name}, which cause the situation you mentioned. The whole process would be like this:

  1. old pod deallocate, isAllocated is set to false.
  2. new pod allocate, isAllocated will be changed from false to true.
  3. old pod deallocate again because isAllocated is true.

We can find that old pod deallocate more than once.

chrisliu1995 commented 1 year ago

This problem will be fixed in the next version.