Closed irisdingbj closed 4 months ago
kubectl get svc -n switch
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
embedding-svc ClusterIP 10.96.212.67 <none> 6000/TCP 11m
embedding-svc-large ClusterIP 10.96.214.151 <none> 6000/TCP 11m
embedding-svc-small ClusterIP 10.96.39.68 <none> 6000/TCP 11m
llm-svc ClusterIP 10.96.236.0 <none> 9000/TCP 11m
llm-svc-intel ClusterIP 10.96.10.72 <none> 9000/TCP 11m
llm-svc-llama ClusterIP 10.96.176.105 <none> 9000/TCP 11m
redis-vector-db ClusterIP 10.96.190.159 <none> 6379/TCP,8001/TCP 11m
reranking-svc ClusterIP 10.96.224.112 <none> 8000/TCP 11m
retriever-svc ClusterIP 10.96.173.137 <none> 7000/TCP 11m
router-service ClusterIP 10.96.185.113 <none> 8080/TCP 11m
tei-embedding-svc-bge-small ClusterIP 10.96.236.40 <none> 6006/TCP 11m
tei-embedding-svc-bge15 ClusterIP 10.96.57.9 <none> 6006/TCP 11m
tei-reranking-svc ClusterIP 10.96.34.209 <none> 8808/TCP 11m
tgi-service-intel ClusterIP 10.96.187.199 <none> 9009/TCP 11m
tgi-service-llama ClusterIP 10.96.41.154 <none> 9009/TCP 11m
current env var
EMBEDDING_MODEL_ID: BAAI/bge-base-en-v1.5
EMBEDDING_SERVICE_HOST_IP: embedding-svc
HUGGINGFACEHUB_API_TOKEN: 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
INDEX_NAME: rag-redis
LLM_MODEL_ID: Intel/neural-chat-7b-v3-3
LLM_SERVICE_HOST_IP: llm-svc
REDIS_URL: redis://redis-vector-db.switch.svc.cluster.local:6379
RERANK_MODEL_ID: BAAI/bge-reranker-large
RERANK_SERVICE_HOST_IP: reranking-svc
RETRIEVER_SERVICE_HOST_IP: retriever-svc
TEI_EMBEDDING_ENDPOINT: http://tei-embedding-svc-bge-small.switch.svc.cluster.local:6006
TEI_RERANKING_ENDPOINT: http://tei-reranking-svc.switch.svc.cluster.local:8808
TGI_LLM_ENDPOINT: http://tgi-service-llama.switch.svc.cluster.local:9009
the solutions is to overwrite and inject ENV into every deployment according to the specific config.
Hi @irisdingbj, It's unnecessary to set condition
for some components, such as TeiEmbedding
and Tgi
, because they are decided by Embedding
and Llm
. Does it make sense? If yes, the YAML should be below:
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
apiVersion: gmc.opea.io/v1alpha3
kind: GMConnector
metadata:
labels:
app.kubernetes.io/name: gmconnector
app.kubernetes.io/managed-by: kustomize
gmc/platform: xeon
name: switch
namespace: switch
spec:
routerConfig:
name: router
serviceName: router-service
nodes:
root:
routerType: Sequence
steps:
- name: Embedding
nodeName: node1
- name: Retriever
data: $response
internalService:
serviceName: retriever-svc
config:
endpoint: /v1/retrieval
- name: VectorDB
internalService:
serviceName: redis-vector-db
isDownstreamService: true
- name: Reranking
data: $response
internalService:
serviceName: reranking-svc
config:
endpoint: /v1/reranking
- name: TeiReranking
internalService:
serviceName: tei-reranking-svc
config:
endpoint: /rerank
isDownstreamService: true
- name: Llm
nodeName: node2
node1:
routerType: Switch
steps:
- name: Embedding
condition: embedding-model-id==large
internalService:
serviceName: embedding-svc-large
config:
endpoint: /v1/embeddings
- name: TeiEmbedding
internalService:
serviceName: tei-embedding-svc-bge15
config:
EMBEDDING_MODEL_ID: BAAI/bge-base-en-v1.5
isDownstreamService: true
- name: Embedding
condition: embedding-model-id==small
internalService:
serviceName: embedding-svc-small
config:
endpoint: /v1/embeddings
- name: TeiEmbedding
internalService:
serviceName: tei-embedding-svc-bge-small
config:
EMBEDDING_MODEL_ID: BAAI/bge-small-en-v1.5
isDownstreamService: true
node2:
routerType: Switch
steps:
- name: Llm
condition: model_id==intel
data: $response
internalService:
serviceName: llm-svc-intel
config:
endpoint: /v1/chat/completions
- name: Tgi
internalService:
serviceName: tgi-service-intel
config:
endpoint: /generate
LLM_MODEL_ID: Intel/neural-chat-7b-v3-3
isDownstreamService: true
- name: Llm
condition: model_id==llama
data: $response
internalService:
serviceName: llm-svc-llama
config:
endpoint: /v1/chat/completions
- name: Tgi
internalService:
serviceName: tgi-service-llama
config:
endpoint: /generate
LLM_MODEL_ID: openlm-research/open_llama_3b
isDownstreamService: true
Hi all, please refer to https://github.com/opea-project/GenAIInfra/pull/206 for the final yaml files of multiple env-var endpoints support and switch support.
TEI_EMBEDDING_ENDPOINT TEI_RERANKING_ENDPOINT TGI_LLM_ENDPOINT