Closed thechaos16 closed 3 months ago
It is because of the resources. The yaml file has empty {}
resources at first, and after I put some specifications like
resources:
limits:
cpu: '6'
memory: 48Gi
nvidia.com/gpu: '1'
requests:
cpu: '3'
memory: 48Gi
nvidia.com/gpu: '1'
it works
Description I tried to launch triton server with vllm (with llama3 8B model on H100). When I tried to deploy a pod by myself (with argoCD), it works well, but somehow it shows
Stub process is not healthy
when I try to deploy a pod within Kserve (with exactly same setup).Triton Information
Are you using the Triton container or did you build it yourself?
RUN pip3 install --upgrade vllm transformers && \ wget https://github.com/flashinfer-ai/flashinfer/releases/download/v0.1.2/flashinfer-0.1.2+cu121torch2.3-cp310-cp310-linux_x86_64.whl#sha256=5303ea4ca718521e167e5a4c5379f39fd3bc3cf7be16bed52e302476c1d12fa7 && \ pip3 install flashinfer-0.1.2+cu121torch2.3-cp310-cp310-linux_x86_64.whl && \ rm flashinfer-0.1.2+cu121torch2.3-cp310-cp310-linux_x86_64.whl
yaml file
error message
INFO 08-20 06:19:34 model_runner.py:692] Loading model weights took 14.9595 GB INFO 08-20 06:19:34 gpu_executor.py:102] # GPU blocks: 11432, # CPU blocks: 2048 I0820 06:19:43.371814 2568 python_be.cc:2050] "TRITONBACKEND_ModelInstanceFinalize: delete instance state" E0820 06:19:43.372212 2568 backend_model.cc:692] "ERROR: Failed to create instance: Stub process 'vllm_0_0' is not healthy." I0820 06:19:43.372254 2568 python_be.cc:1891] "TRITONBACKEND_ModelFinalize: delete model state" E0820 06:19:43.372303 2568 model_lifecycle.cc:641] "failed to load 'vllm' version 1: Internal: Stub process 'vllm_0_0' is not healthy." I0820 06:19:43.372311 2568 model_lifecycle.cc:695] "OnLoadComplete() 'vllm' version 1" I0820 06:19:43.372321 2568 model_lifecycle.cc:733] "OnLoadFinal() 'vllm' for all version(s)" I0820 06:19:43.372327 2568 model_lifecycle.cc:776] "failed to load 'vllm'" I0820 06:19:43.372469 2568 model_lifecycle.cc:297] "VersionStates() 'vllm'" I0820 06:19:43.372504 2568 model_lifecycle.cc:297] "VersionStates() 'vllm'" I0820 06:19:43.372565 2568 server.cc:604] +------------------+------+ | Repository Agent | Path | +------------------+------+ +------------------+------+
I0820 06:19:43.372595 2568 server.cc:631] +---------+-------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------+ | Backend | Path | Config | +---------+-------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------+ | python | /opt/tritonserver/backends/python/libtriton_python.so | {"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","default-max-batc | | | | h-size":"4"}} | | vllm | /opt/tritonserver/backends/vllm/model.py | {"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","default-max-batc | | | | h-size":"4"}} | +---------+-------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------+
I0820 06:19:43.372623 2568 model_lifecycle.cc:276] "ModelStates()" I0820 06:19:43.372635 2568 server.cc:674] +-------+---------+----------------------------------------------------------------+ | Model | Version | Status | +-------+---------+----------------------------------------------------------------+ | vllm | 1 | UNAVAILABLE: Internal: Stub process 'vllm_0_0' is not healthy. | +-------+---------+----------------------------------------------------------------+