opea-project / GenAIInfra

Containerization and cloud native suite for OPEA
Apache License 2.0
16 stars 22 forks source link

GMC: add Codetrans example #115

Closed KfreeZ closed 6 days ago

KfreeZ commented 1 week ago

Description

add codetrans example

Issues

n/a

Type of change

List the type of change like below. Please delete options that are not relevant.

Dependencies

n/a

Tests

sdp@satg-opea-7:~/iris$ kubectl get pods -A
NAMESPACE            NAME                                              READY   STATUS    RESTARTS   AGE
codegen              llm-service-deployment-58c7c49798-sdcsv           1/1     Running   0          5d5h
codegen              router-service-deployment-5f5cd7495b-dkzhr        1/1     Running   0          5m56s
codegen              tgi-service-deployment-75987cc646-s5sgq           1/1     Running   0          5m59s
codetrans            codetrans-service-deployment-584bcbc67b-rzgbh     1/1     Running   0          73s
codetrans            router-service-deployment-86bcb4586f-b9xx5        1/1     Running   0          6m11s
codetrans            tgi-service-deployment-5467dc5c87-bwg2p           1/1     Running   0          30s
default              client-test-7b7f97ddd9-zq9f7                      1/1     Running   0          9d
kube-system          coredns-7db6d8ff4d-78fsv                          1/1     Running   0          12d
kube-system          coredns-7db6d8ff4d-wsrc8                          1/1     Running   0          12d
kube-system          etcd-kind-control-plane                           1/1     Running   0          12d
kube-system          kindnet-cvzsg                                     1/1     Running   0          12d
kube-system          kube-apiserver-kind-control-plane                 1/1     Running   0          12d
kube-system          kube-controller-manager-kind-control-plane        1/1     Running   0          12d
kube-system          kube-proxy-9pxfz                                  1/1     Running   0          12d
kube-system          kube-scheduler-kind-control-plane                 1/1     Running   0          12d
local-path-storage   local-path-provisioner-988d74bc-8djrt             1/1     Running   0          12d
mi6                  router-service-deployment-5f69fcb765-rv8sm        1/1     Running   0          6m3s
mi6                  tgi-svc-llama-deployment-9498f496f-mm2fz          1/1     Running   0          5d18h
mi6                  tgi-svc-neural-chat-deployment-7b44cdcf7c-kwzpb   1/1     Running   0          5d19h
opea-system          gmc-controller-7db7dbbd48-9psxq                   1/1     Running   0          6m24s
sdp@satg-opea-7:~/iris$ kubectl logs -n codetrans            codetrans-service-deployment-584bcbc67b-rzgbh
/usr/local/lib/python3.11/site-packages/pydantic/_internal/_fields.py:149: UserWarning: Field "model_name_or_path" has conflict with protected namespace "model_".

You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.
  warnings.warn(
[2024-06-24 02:31:37,338] [    INFO] - CORS is enabled.
[2024-06-24 02:31:37,338] [    INFO] - Setting up HTTP server
[2024-06-24 02:31:37,339] [    INFO] - Uvicorn server setup on port 9000
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:9000 (Press CTRL+C to quit)
[2024-06-24 02:31:37,356] [    INFO] - HTTP server setup successful
sdp@satg-opea-7:~/iris$ kubectl exec -it client-test-7b7f97ddd9-zq9f7 -- curl http://router-service.codetrans.svc.cluster.local:8080/ \
  -X POST \
  -d '{"query":"    ### System: Please translate the following Golang codes into  Python codes.    ### Original codes:    '\'''\'''\''Golang    \npackage main\n\nimport \"fmt\"\nfunc main() {\n    fmt.Println(\"Hello, World!\");\n    '\'''\'''\''    ### Translated codes:"}' \
  -H 'Content-Type: application/json'
data: b'   '

data: b" '''"

data: b'Py'

data: b'thon'

data: b'\n'

data: b'   '

data: b' import'

data: b' sys'

data: b'\n'

data: b'   '

data: b' print'

data: b'("'

data: b'Hello'

data: b','

data: b' World'

data: b'!'

data: b'")'

data: b'\n'

data: b'   '

data: b" '''"

data: b'</s>'

data: [DONE]

sdp@satg-opea-7:~/iris$
sdp@satg-opea-7:~/iris$
sdp@satg-opea-7:~/iris$ kubectl logs -n codetrans            codetrans-service-deployment-584bcbc67b-rzgbh
/usr/local/lib/python3.11/site-packages/pydantic/_internal/_fields.py:149: UserWarning: Field "model_name_or_path" has conflict with protected namespace "model_".

You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.
  warnings.warn(
[2024-06-24 02:31:37,338] [    INFO] - CORS is enabled.
[2024-06-24 02:31:37,338] [    INFO] - Setting up HTTP server
[2024-06-24 02:31:37,339] [    INFO] - Uvicorn server setup on port 9000
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:9000 (Press CTRL+C to quit)
[2024-06-24 02:31:37,356] [    INFO] - HTTP server setup successful
The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: fineGrained).
Your token has been saved to /home/user/.cache/huggingface/token
Login successful
INFO:     10.244.0.5:59694 - "POST /v1/chat/completions HTTP/1.1" 200 OK
sdp@satg-opea-7:~/iris$ kubectl logs -n codetrans tgi-service-deployment-5467dc5c87-bwg2p
2024-06-24T02:32:19.267897Z  INFO text_generation_launcher: Args { model_id: "HuggingFaceH4/mistral-7b-grok", revision: None, validation_workers: 2, sharded: None, num_shard: None, quantize: None, speculate: None, dtype: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_top_n_tokens: 5, max_input_length: 1024, max_total_tokens: 2048, waiting_served_ratio: 1.2, max_batch_prefill_tokens: 4096, max_batch_total_tokens: None, max_waiting_tokens: 20, max_batch_size: None, enable_cuda_graphs: false, hostname: "tgi-service-deployment-5467dc5c87-bwg2p", port: 80, shard_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: Some("/data"), weights_cache_override: None, disable_custom_kernels: false, cuda_memory_fraction: 1.0, rope_scaling: None, rope_factor: None, json_output: false, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_edge: None, tokenizer_config_path: None, disable_grammar_support: false, env: false }
2024-06-24T02:32:19.268175Z  INFO download: text_generation_launcher: Starting download process.
2024-06-24T02:32:21.589110Z  INFO text_generation_launcher: Files are already present on the host. Skipping download.

2024-06-24T02:32:22.173592Z  INFO download: text_generation_launcher: Successfully downloaded weights.
2024-06-24T02:32:22.174154Z  INFO shard-manager: text_generation_launcher: Starting shard rank=0
2024-06-24T02:32:25.027453Z  WARN text_generation_launcher: We're not using custom kernels.

2024-06-24T02:32:25.058843Z  WARN text_generation_launcher: Could not import Flash Attention enabled models: CUDA is not available

2024-06-24T02:32:29.679305Z  INFO text_generation_launcher: Server started at unix:///tmp/text-generation-server-0

2024-06-24T02:32:29.685891Z  INFO shard-manager: text_generation_launcher: Shard ready in 7.509462076s rank=0
2024-06-24T02:32:29.782100Z  INFO text_generation_launcher: Starting Webserver
2024-06-24T02:32:29.834279Z  INFO text_generation_router: router/src/main.rs:181: Using the Hugging Face API
2024-06-24T02:32:29.834317Z  INFO hf_hub: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/hf-hub-0.3.2/src/lib.rs:55: Token file not found "/root/.cache/huggingface/token"
2024-06-24T02:32:30.347694Z  INFO text_generation_router: router/src/main.rs:443: Serving revision 038a70da219335747827bc58464bc95dbdbdd623 of model HuggingFaceH4/mistral-7b-grok
2024-06-24T02:32:30.347736Z  INFO text_generation_router: router/src/main.rs:242: Using the Hugging Face API to retrieve tokenizer config
2024-06-24T02:32:30.358589Z  INFO text_generation_router: router/src/main.rs:291: Warming up model
2024-06-24T02:32:58.136300Z  WARN text_generation_router: router/src/main.rs:306: Model does not support automatic max batch total tokens
2024-06-24T02:32:58.136324Z  INFO text_generation_router: router/src/main.rs:328: Setting max batch total tokens to 16000
2024-06-24T02:32:58.136327Z  INFO text_generation_router: router/src/main.rs:329: Connected
2024-06-24T02:32:58.136331Z  WARN text_generation_router: router/src/main.rs:343: Invalid hostname, defaulting to 0.0.0.0
2024-06-24T02:33:22.303447Z  INFO compat_generate{default_return_full_text=true compute_type=Extension(ComputeType("gpu+optimized"))}:generate_stream{parameters=GenerateParameters { best_of: None, temperature: Some(0.01), repetition_penalty: Some(1.03), frequency_penalty: None, top_k: Some(10), top_p: Some(0.95), typical_p: Some(0.95), do_sample: false, max_new_tokens: Some(1024), return_full_text: Some(false), stop: [], truncate: None, watermark: false, details: false, decoder_input_details: false, seed: None, top_n_tokens: None, grammar: None } total_time="5.654043113s" validation_time="1.621198ms" queue_time="127.61µs" inference_time="5.652294653s" time_per_token="269.156888ms" seed="Some(1541799164742613152)"}: text_generation_router::server: router/src/server.rs:497: Success
sdp@satg-opea-7:~/iris$