support multiple env-var for ENDPOINT in GMC

irisdingbj commented 4 months ago

TEI_EMBEDDING_ENDPOINT TEI_RERANKING_ENDPOINT TGI_LLM_ENDPOINT

# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

apiVersion: gmc.opea.io/v1alpha3
kind: GMConnector
metadata:
  labels:
    app.kubernetes.io/name: gmconnector
    app.kubernetes.io/managed-by: kustomize
    gmc/platform: xeon
  name: switch
  namespace: switch
spec:
  routerConfig:
    name: router
    serviceName: router-service
  nodes:
    root:
      routerType: Sequence
      steps:
      - name: Embedding
        nodeName: node1
      - name: Retriever
        data: $response
        internalService:
          serviceName: retriever-svc
          config:
            endpoint: /v1/retrieval
      - name: VectorDB
        internalService:
          serviceName: redis-vector-db
          isDownstreamService: true
      - name: Reranking
        data: $response
        internalService:
          serviceName: reranking-svc
          config:
            endpoint: /v1/reranking
      - name: TeiReranking
        internalService:
          serviceName: tei-reranking-svc
          config:
            endpoint: /rerank
          isDownstreamService: true
      - name: Llm
        nodeName: node2
    node1:
      routerType: Switch
      steps:
        - name: Embedding
          condition: embedding-model-id==large
          internalService:
            serviceName: embedding-svc-large
            config:
              endpoint: /v1/embeddings
        - name: TeiEmbedding
          condition: embedding-model-id==large
          internalService:
            serviceName: tei-embedding-svc-bge15
            config:
              EMBEDDING_MODEL_ID: BAAI/bge-base-en-v1.5
            isDownstreamService: true
        - name: Embedding
          condition: embedding-model-id==small
          internalService:
            serviceName: embedding-svc-small
            config:
              endpoint: /v1/embeddings
        - name: TeiEmbedding
          condition: embedding-model-id==small
          internalService:
            serviceName: tei-embedding-svc-bge-small
            config:
              EMBEDDING_MODEL_ID: BAAI/bge-small-en-v1.5
            isDownstreamService: true
    node2:
      routerType: Switch
      steps:
        - name: Llm
          condition: model_id==intel
          data: $response
          internalService:
            serviceName: llm-svc-intel
            config:
              endpoint: /v1/chat/completions
        - name: Tgi
          condition: model-id==intel
          internalService:
            serviceName: tgi-service-intel
            config:
              endpoint: /generate
              LLM_MODEL_ID: Intel/neural-chat-7b-v3-3
            isDownstreamService: true
        - name: Llm
          condition: model_id==llama
          data: $response
          internalService:
            serviceName: llm-svc-llama
            config:
              endpoint: /v1/chat/completions
        - name: Tgi
          condition: model_id==llama
          internalService:
            serviceName: tgi-service-llama
            config:
              endpoint: /generate
              LLM_MODEL_ID: openlm-research/open_llama_3b
            isDownstreamService: true

irisdingbj commented 4 months ago

kubectl get svc -n switch
NAME                          TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)             AGE
embedding-svc                 ClusterIP   10.96.212.67    <none>        6000/TCP            11m
embedding-svc-large           ClusterIP   10.96.214.151   <none>        6000/TCP            11m
embedding-svc-small           ClusterIP   10.96.39.68     <none>        6000/TCP            11m
llm-svc                       ClusterIP   10.96.236.0     <none>        9000/TCP            11m
llm-svc-intel                 ClusterIP   10.96.10.72     <none>        9000/TCP            11m
llm-svc-llama                 ClusterIP   10.96.176.105   <none>        9000/TCP            11m
redis-vector-db               ClusterIP   10.96.190.159   <none>        6379/TCP,8001/TCP   11m
reranking-svc                 ClusterIP   10.96.224.112   <none>        8000/TCP            11m
retriever-svc                 ClusterIP   10.96.173.137   <none>        7000/TCP            11m
router-service                ClusterIP   10.96.185.113   <none>        8080/TCP            11m
tei-embedding-svc-bge-small   ClusterIP   10.96.236.40    <none>        6006/TCP            11m
tei-embedding-svc-bge15       ClusterIP   10.96.57.9      <none>        6006/TCP            11m
tei-reranking-svc             ClusterIP   10.96.34.209    <none>        8808/TCP            11m
tgi-service-intel             ClusterIP   10.96.187.199   <none>        9009/TCP            11m
tgi-service-llama             ClusterIP   10.96.41.154    <none>        9009/TCP            11m

current env var

  EMBEDDING_MODEL_ID: BAAI/bge-base-en-v1.5
  EMBEDDING_SERVICE_HOST_IP: embedding-svc
  HUGGINGFACEHUB_API_TOKEN: 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
  INDEX_NAME: rag-redis
  LLM_MODEL_ID: Intel/neural-chat-7b-v3-3
  LLM_SERVICE_HOST_IP: llm-svc
  REDIS_URL: redis://redis-vector-db.switch.svc.cluster.local:6379
  RERANK_MODEL_ID: BAAI/bge-reranker-large
  RERANK_SERVICE_HOST_IP: reranking-svc
  RETRIEVER_SERVICE_HOST_IP: retriever-svc
  TEI_EMBEDDING_ENDPOINT: http://tei-embedding-svc-bge-small.switch.svc.cluster.local:6006
  TEI_RERANKING_ENDPOINT: http://tei-reranking-svc.switch.svc.cluster.local:8808
  TGI_LLM_ENDPOINT: http://tgi-service-llama.switch.svc.cluster.local:9009

KfreeZ commented 4 months ago

the solutions is to overwrite and inject ENV into every deployment according to the specific config.

zhlsunshine commented 4 months ago

Hi @irisdingbj, It's unnecessary to set condition for some components, such as TeiEmbedding and Tgi, because they are decided by Embedding and Llm. Does it make sense? If yes, the YAML should be below:

# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

apiVersion: gmc.opea.io/v1alpha3
kind: GMConnector
metadata:
  labels:
    app.kubernetes.io/name: gmconnector
    app.kubernetes.io/managed-by: kustomize
    gmc/platform: xeon
  name: switch
  namespace: switch
spec:
  routerConfig:
    name: router
    serviceName: router-service
  nodes:
    root:
      routerType: Sequence
      steps:
      - name: Embedding
        nodeName: node1
      - name: Retriever
        data: $response
        internalService:
          serviceName: retriever-svc
          config:
            endpoint: /v1/retrieval
      - name: VectorDB
        internalService:
          serviceName: redis-vector-db
          isDownstreamService: true
      - name: Reranking
        data: $response
        internalService:
          serviceName: reranking-svc
          config:
            endpoint: /v1/reranking
      - name: TeiReranking
        internalService:
          serviceName: tei-reranking-svc
          config:
            endpoint: /rerank
          isDownstreamService: true
      - name: Llm
        nodeName: node2
    node1:
      routerType: Switch
      steps:
        - name: Embedding
          condition: embedding-model-id==large
          internalService:
            serviceName: embedding-svc-large
            config:
              endpoint: /v1/embeddings
        - name: TeiEmbedding
          internalService:
            serviceName: tei-embedding-svc-bge15
            config:
              EMBEDDING_MODEL_ID: BAAI/bge-base-en-v1.5
            isDownstreamService: true
        - name: Embedding
          condition: embedding-model-id==small
          internalService:
            serviceName: embedding-svc-small
            config:
              endpoint: /v1/embeddings
        - name: TeiEmbedding
          internalService:
            serviceName: tei-embedding-svc-bge-small
            config:
              EMBEDDING_MODEL_ID: BAAI/bge-small-en-v1.5
            isDownstreamService: true
    node2:
      routerType: Switch
      steps:
        - name: Llm
          condition: model_id==intel
          data: $response
          internalService:
            serviceName: llm-svc-intel
            config:
              endpoint: /v1/chat/completions
        - name: Tgi
          internalService:
            serviceName: tgi-service-intel
            config:
              endpoint: /generate
              LLM_MODEL_ID: Intel/neural-chat-7b-v3-3
            isDownstreamService: true
        - name: Llm
          condition: model_id==llama
          data: $response
          internalService:
            serviceName: llm-svc-llama
            config:
              endpoint: /v1/chat/completions
        - name: Tgi
          internalService:
            serviceName: tgi-service-llama
            config:
              endpoint: /generate
              LLM_MODEL_ID: openlm-research/open_llama_3b
            isDownstreamService: true

zhlsunshine commented 4 months ago

Hi all, please refer to https://github.com/opea-project/GenAIInfra/pull/206 for the final yaml files of multiple env-var endpoints support and switch support.

opea-project / GenAIInfra

support multiple env-var for ENDPOINT in GMC #166