huggingface / text-generation-inference

Large Language Model Text Generation Inference
http://hf.co/docs/text-generation-inference
Apache License 2.0
8.87k stars 1.05k forks source link

TGI crashes on Multi GPUs #2206

Closed RohanSohani30 closed 3 months ago

RohanSohani30 commented 3 months ago

I am trying to run TGI on Docker using 8 GPUs with 16GB each, using the following command: docker run --gpus all --name tgi --shm-size 1g --cpus="5.0" --rm --runtime=nvidia -e HUGGING_FACE_HUB_TOKEN=*** -p 8060:80 -v '$PATH':/data ghcr.io/huggingface/text-generation-inference --model-id meta-llama/Meta-Llama-3-8B --num-shard 8 --max-input-length 14000 --max-batch-prefill-tokens 14000 --max-total-tokens 16000 My server crashes when using all GPUs, but Docker works fine with just one GPU.

Hugoch commented 3 months ago

@RohanSohani30 Can you submit an issue with a 🐛 Bug Report template so that we get the needed information? Thanks a lot 🤗