kubeflow / arena

A CLI for Kubeflow.
Apache License 2.0
718 stars 176 forks source link

[BUG] serving pod launched by Arena is not handling SIGTERM signal #1077

Open TrafalgarZZZ opened 2 months ago

TrafalgarZZZ commented 2 months ago

I'm running Kserve serving with arena, with the following command:

arena serve kserve \
    --name=qwen \
    --image=vllm/vllm-openai:0.4.1 \
    --gpus=1 \
    --cpu=4 \
    --memory=20Gi \
    --min-replicas 0 \
    --data="llm-model:/mnt/" \
    "python3 -m vllm.entrypoints.openai.api_server --port 8080 --trust-remote-code --served-model-name qwen --model /mnt/models/Qwen-7B-Chat --gpu-memory-utilization 0.95"

When I try to delete the InferenceService created by Arena, I found the pod stuck in Terminating state for a very long time. It seems that my python3 process does not receive SIGTERM signal so the Pod keep terminating until it reaches TerminationGracePeriod which is set to 300s by default.

IMO, ability to handle SIGTERM signal is necessary for serving Pods because they may rely on such signals to stop gracefully (e.g. refuse new coming request and wait for running requests to finish).

Syulin7 commented 1 week ago

Arena adds sh -c before the command. You can prefix your command with exec, for example:

arena serve kserve \
    --name=qwen \
    --image=vllm/vllm-openai:0.4.1 \
    --gpus=1 \
    --cpu=4 \
    --memory=20Gi \
    --min-replicas 0 \
    --data="llm-model:/mnt/" \
    "exec python3 -m vllm.entrypoints.openai.api_server --port 8080 --trust-remote-code --served-model-name qwen --model /mnt/models/Qwen-7B-Chat --gpu-memory-utilization 0.95"