I am trying to run TGI on Docker using 8 GPUs with 16GB each, using the following command:
docker run --gpus all --name tgi --shm-size 1g --cpus="5.0" --rm --runtime=nvidia -e HUGGING_FACE_HUB_TOKEN=*** -p 8060:80 -v '$PATH':/data ghcr.io/huggingface/text-generation-inference --model-id meta-llama/Meta-Llama-3-8B --num-shard 8 --max-input-length 14000 --max-batch-prefill-tokens 14000 --max-total-tokens 16000
My server crashes when using all GPUs, but Docker works fine with just one GPU.
I am trying to run TGI on Docker using 8 GPUs with 16GB each, using the following command: docker run --gpus all --name tgi --shm-size 1g --cpus="5.0" --rm --runtime=nvidia -e HUGGING_FACE_HUB_TOKEN=*** -p 8060:80 -v '$PATH':/data ghcr.io/huggingface/text-generation-inference --model-id meta-llama/Meta-Llama-3-8B --num-shard 8 --max-input-length 14000 --max-batch-prefill-tokens 14000 --max-total-tokens 16000 My server crashes when using all GPUs, but Docker works fine with just one GPU.