Open TrafalgarZZZ opened 2 months ago
Arena adds sh -c
before the command. You can prefix your command with exec
, for example:
arena serve kserve \
--name=qwen \
--image=vllm/vllm-openai:0.4.1 \
--gpus=1 \
--cpu=4 \
--memory=20Gi \
--min-replicas 0 \
--data="llm-model:/mnt/" \
"exec python3 -m vllm.entrypoints.openai.api_server --port 8080 --trust-remote-code --served-model-name qwen --model /mnt/models/Qwen-7B-Chat --gpu-memory-utilization 0.95"
I'm running Kserve serving with arena, with the following command:
When I try to delete the
InferenceService
created by Arena, I found the pod stuck in Terminating state for a very long time. It seems that mypython3
process does not receive SIGTERM signal so the Pod keep terminating until it reachesTerminationGracePeriod
which is set to 300s by default.IMO, ability to handle SIGTERM signal is necessary for serving Pods because they may rely on such signals to stop gracefully (e.g. refuse new coming request and wait for running requests to finish).