Unable to stop TGI after serving models

System Info

I use the official docker image: ghcr.io/huggingface/text-generation-inference:2.0.1

Information

[X] Docker
[ ] The CLI directly

Tasks

[X] An officially supported command
[ ] My own modifications

Reproduction

I used the following command to serve the model. After TGI finished the model sharding/loading and started serving, I cannot use Ctrl+C to terminate the server.

model=mistralai/Mixtral-8x7B-Instruct-v0.1
volume=/my_path_for_hf_cache
token="myhftokens"

docker run --gpus '"device=4,5"' \
    --shm-size 20g \
    -e HUGGING_FACE_HUB_TOKEN=$token \
    -p 8080:80 \
    -v $volume:/data ghcr.io/huggingface/text-generation-inference:2.0.1 \
    --model-id $model \
    --sharded true \
    --quantize eetq \
    --max-input-length 10240 \
    --max-batch-prefill-tokens 10240 \
    --max-total-tokens 32768 \
    --port 80

Expected behavior

In previous version 1.3.0 and 1.4.0, I can use Ctrl+C to terminate the server while it is not the case for 2.0.1. My current solution is to use docker command to kill the container. Not sure if this is a good way?

huggingface / text-generation-inference

Unable to stop TGI after serving models #1842

System Info

Information

Tasks

Reproduction

Expected behavior