[BUG] serving pod launched by Arena is not handling SIGTERM signal

I'm running Kserve serving with arena, with the following command:

arena serve kserve \
    --name=qwen \
    --image=vllm/vllm-openai:0.4.1 \
    --gpus=1 \
    --cpu=4 \
    --memory=20Gi \
    --min-replicas 0 \
    --data="llm-model:/mnt/" \
    "python3 -m vllm.entrypoints.openai.api_server --port 8080 --trust-remote-code --served-model-name qwen --model /mnt/models/Qwen-7B-Chat --gpu-memory-utilization 0.95"

When I try to delete the InferenceService created by Arena, I found the pod stuck in Terminating state for a very long time. It seems that my python3 process does not receive SIGTERM signal so the Pod keep terminating until it reaches TerminationGracePeriod which is set to 300s by default.

IMO, ability to handle SIGTERM signal is necessary for serving Pods because they may rely on such signals to stop gracefully (e.g. refuse new coming request and wait for running requests to finish).

kubeflow / arena

[BUG] serving pod launched by Arena is not handling SIGTERM signal #1077