huggingface / text-generation-inference

Large Language Model Text Generation Inference
http://hf.co/docs/text-generation-inference
Apache License 2.0
8.99k stars 1.06k forks source link

Unable to stop TGI after serving models #1842

Closed ponshane closed 5 months ago

ponshane commented 6 months ago

System Info

I use the official docker image: ghcr.io/huggingface/text-generation-inference:2.0.1

Information

Tasks

Reproduction

I used the following command to serve the model. After TGI finished the model sharding/loading and started serving, I cannot use Ctrl+C to terminate the server.

model=mistralai/Mixtral-8x7B-Instruct-v0.1
volume=/my_path_for_hf_cache
token="myhftokens"

docker run --gpus '"device=4,5"' \
    --shm-size 20g \
    -e HUGGING_FACE_HUB_TOKEN=$token \
    -p 8080:80 \
    -v $volume:/data ghcr.io/huggingface/text-generation-inference:2.0.1 \
    --model-id $model \
    --sharded true \
    --quantize eetq \
    --max-input-length 10240 \
    --max-batch-prefill-tokens 10240 \
    --max-total-tokens 32768 \
    --port 80

Expected behavior

In previous version 1.3.0 and 1.4.0, I can use Ctrl+C to terminate the server while it is not the case for 2.0.1. My current solution is to use docker command to kill the container. Not sure if this is a good way?

regisss commented 6 months ago

Same here, it seems to come from #1716: https://github.com/huggingface/tgi-gaudi/pull/134#issuecomment-2095365083