opea-project / GenAIInfra

Containerization and cloud native suite for OPEA
Apache License 2.0
16 stars 22 forks source link

GMC: replace the service and deployment name if GMC has defined #98

Closed KfreeZ closed 1 week ago

KfreeZ commented 2 weeks ago

Description

  1. if GMC has defined the service name, replace the service name in the manifests with the GMC's defined service name and replace the deployment name with "GMC's defined service name" + "-deployment"
  2. remove dynamicClient from code, this could help the UT issue
  3. set 1min timeout for error provisioning resources to k8s
  4. avoid double reconciling when GMC is setting a graph's status
  5. reconcile router if GMC spec is changed, previously router don't change if it is existing

Issues

n/a.

Type of change

List the type of change like below. Please delete options that are not relevant.

Dependencies

'n/a'

Tests

sdp@satg-opea-7:/home/kefei$ kubectl get svc -n mi6
NAME                  TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
router-service        ClusterIP   10.96.248.37    <none>        8080/TCP   11h
tgi-svc-llama         ClusterIP   10.96.111.254   <none>        9009/TCP   52m
tgi-svc-neural-chat   ClusterIP   10.96.62.248    <none>        9009/TCP   52m

sdp@satg-opea-7:/home/kefei$ kubectl get deployment -n mi6
NAME                             READY   UP-TO-DATE   AVAILABLE   AGE
router-server                    1/1     1            1           11h
tgi-svc-llama-deployment         0/1     1            0           19m
tgi-svc-neural-chat-deployment   1/1     1            1           19m

sdp@satg-opea-7:~/iris$ kubectl describe pods -n mi6 tgi-svc-neural-chat-deployment-7b865f96d9-6cxpd
Name:             tgi-svc-neural-chat-deployment-7b865f96d9-6cxpd
Namespace:        mi6
Priority:         0
Service Account:  default
Node:             kind-control-plane/172.18.0.2
Start Time:       Fri, 14 Jun 2024 08:02:22 +0000
Labels:           app=tgi-service-deploy
                  pod-template-hash=7b865f96d9
Annotations:      sidecar.istio.io/rewriteAppHTTPProbers: true
Status:           Running
IP:               10.244.0.164
IPs:
  IP:           10.244.0.164
Controlled By:  ReplicaSet/tgi-svc-neural-chat-deployment-7b865f96d9
Containers:
  tgi-service-deploy-demo:
    Container ID:  containerd://769c8a2beb26b916a7fb4d08372781bddfc3969883264b4ed7f22cd055f34971
    Image:         ghcr.io/huggingface/text-generation-inference:1.4
    Image ID:      ghcr.io/huggingface/text-generation-inference@sha256:94b9758c28e583ff66dffa53305eb98474ff9bb3d59b0b723aad51788efda299
    Port:          80/TCP
    Host Port:     0/TCP
    Args:
      --model-id
      $(LLM_MODEL_ID)
    State:          Running
      Started:      Fri, 14 Jun 2024 08:02:23 +0000
    Ready:          True
    Restart Count:  0
    Environment Variables from:
      qna-config  ConfigMap  Optional: false
    Environment:
      LLM_MODEL_ID:  Intel/neural-chat-7b-v3-3
    Mounts:
      /data from model-volume (rw)
      /dev/shm from shm (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-lctbn (ro)
Conditions:
  Type                        Status
  PodReadyToStartContainers   True
  Initialized                 True
  Ready                       True
  ContainersReady             True
  PodScheduled                True
Volumes:
  model-volume:
    Type:          HostPath (bare host directory volume)
    Path:          /mnt/models
    HostPathType:  Directory
  shm:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     Memory
    SizeLimit:  1Gi
  kube-api-access-lctbn:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  77s   default-scheduler  Successfully assigned mi6/tgi-svc-neural-chat-deployment-7b865f96d9-6cxpd to kind-control-plane
  Normal  Pulled     77s   kubelet            Container image "ghcr.io/huggingface/text-generation-inference:1.4" already present on machine
  Normal  Created    77s   kubelet            Created container tgi-service-deploy-demo
  Normal  Started    76s   kubelet            Started container tgi-service-deploy-demo
sdp@satg-opea-7:~/iris$ kubectl get pods -n mi6
NAME                                              READY   STATUS    RESTARTS      AGE
router-server-6cf7bcd586-wp4ht                    1/1     Running   0             11h
tgi-svc-llama-deployment-76b9c8489d-klfkn         0/1     Error     4 (65s ago)   118s
tgi-svc-neural-chat-deployment-7b865f96d9-6cxpd   1/1     Running   0             118s
sdp@satg-opea-7:~/iris$ kubectl describe pods -n mi6 tgi-svc-llama-deployment-76b9c8489d-klfkn
Name:             tgi-svc-llama-deployment-76b9c8489d-klfkn
Namespace:        mi6
Priority:         0
Service Account:  default
Node:             kind-control-plane/172.18.0.2
Start Time:       Fri, 14 Jun 2024 08:02:22 +0000
Labels:           app=tgi-service-deploy
                  pod-template-hash=76b9c8489d
Annotations:      sidecar.istio.io/rewriteAppHTTPProbers: true
Status:           Running
IP:               10.244.0.166
IPs:
  IP:           10.244.0.166
Controlled By:  ReplicaSet/tgi-svc-llama-deployment-76b9c8489d
Containers:
  tgi-service-deploy-demo:
    Container ID:  containerd://5661b882bb5984e31139dd8437e136902f58edf10bfc7457f96bb537c171b199
    Image:         ghcr.io/huggingface/text-generation-inference:1.4
    Image ID:      ghcr.io/huggingface/text-generation-inference@sha256:94b9758c28e583ff66dffa53305eb98474ff9bb3d59b0b723aad51788efda299
    Port:          80/TCP
    Host Port:     0/TCP
    Args:
      --model-id
      $(LLM_MODEL_ID)
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Fri, 14 Jun 2024 08:04:08 +0000
      Finished:     Fri, 14 Jun 2024 08:04:12 +0000
    Ready:          False
    Restart Count:  4
    Environment Variables from:
      qna-config  ConfigMap  Optional: false
    Environment:
      LLM_MODEL_ID:  meta-llama/Llama-2-7b-chat-hf
    Mounts:
      /data from model-volume (rw)
      /dev/shm from shm (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-dl6sl (ro)
Conditions:
  Type                        Status
  PodReadyToStartContainers   True
  Initialized                 True
  Ready                       False
  ContainersReady             False
  PodScheduled                True
Volumes:
  model-volume:
    Type:          HostPath (bare host directory volume)
    Path:          /mnt/models
    HostPathType:  Directory
  shm:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     Memory
    SizeLimit:  1Gi
  kube-api-access-dl6sl:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                  From               Message
  ----     ------     ----                 ----               -------
  Normal   Scheduled  2m10s                default-scheduler  Successfully assigned mi6/tgi-svc-llama-deployment-76b9c8489d-klfkn to kind-control-plane
  Normal   Pulled     24s (x5 over 2m10s)  kubelet            Container image "ghcr.io/huggingface/text-generation-inference:1.4" already present on machine
  Normal   Created    24s (x5 over 2m10s)  kubelet            Created container tgi-service-deploy-demo
  Normal   Started    24s (x5 over 2m9s)   kubelet            Started container tgi-service-deploy-demo
  Warning  BackOff    8s (x9 over 2m1s)    kubelet            Back-off restarting failed container tgi-service-deploy-demo in pod tgi-svc-llama-deployment-7

controller logs

reconcile resource for node: Tgi
trying to reconcile internal service [ tgi-svc-neural-chat ] in namespace  mi6
get step Tgi config for tgi-svc-neural-chat@mi6: &map[LLM_MODEL_ID:Intel/neural-chat-7b-v3-3 endpoint:/generate]
The raw yaml file has been split into 3 yaml files
Success to reconcile Deployment: tgi-svc-neural-chat-deployment
Success to reconcile Service: tgi-svc-neural-chat
the service URL is: http://tgi-svc-neural-chat.mi6.svc.cluster.local:9009/generate

reconcile resource for node: Tgi
trying to reconcile internal service [ tgi-svc-llama ] in namespace  mi6
get step Tgi config for tgi-svc-llama@mi6: &map[LLM_MODEL_ID:meta-llama/Llama-2-7b-chat-hf endpoint:/generate]
The raw yaml file has been split into 3 yaml files
Success to reconcile Deployment: tgi-svc-llama-deployment
Success to reconcile Service: tgi-svc-llama
the service URL is: http://tgi-svc-llama.mi6.svc.cluster.local:9009/generate
KfreeZ commented 2 weeks ago

Thanks @irisdingbj 's contribution to this PR. But I didn't include the router changes Iris, it seems has some problem also failed the router unit tests. So I remove that part for further discussion.

@zhlsunshine I also changed part of the code of patch the environment, because I remove the dynamicClient from the code, as a result, the code becomes simpler. maybe you want to review that part. and we also might need to review whether it will be fine for env patching since the applyResourceToK8s to takes more time than before.

zhlsunshine commented 1 week ago

@zhlsunshine I also changed part of the code of patch the environment, because I remove the dynamicClient from the code, as a result, the code becomes simpler. maybe you want to review that part. and we also might need to review whether it will be fine for env patching since the applyResourceToK8s to takes more time than before.

Hi @KfreeZ, sure, we can validate it together for this.

KfreeZ commented 1 week ago

@zhlsunshine @irisdingbj @mkbhanda After checking with Huailong, I think this PR is clear. Please review again. If you merge it, please squash and merge, there are many commits in it, squash will make it cleaner. the squashed commit message would be :

1. if GMC has defined the service name, replace the service name in the manifests with the GMC's defined service name and replace the deployment name with "GMC's defined service name" + "-deployment"
2. remove dynamicClient from code, this could help the UT issue
3. set 1min timeout for error provisioning resources to k8s
4. avoid double reconciling when GMC is setting a graph's status
5. reconcile router if GMC spec is changed, previously router don't change if it is existing