Can not load local model by --model-id

huggingface / text-generation-inference

Large Language Model Text Generation Inference

http://hf.co/docs/text-generation-inference

Apache License 2.0

8.91k stars 1.05k forks source link

Can not load local model by --model-id #245

Closed paulcx closed 1 year ago

paulcx commented 1 year ago

try two methods to create container (ghcr.io/huggingface/text-generation-inference:sha-7de8a37) but no luck:

docker run --gpus all --shm-size 1g -p 8080:80 ghcr.io/huggingface/text-generation-inference:latest --model-id ./output/checkpoint-400 --num-shard 2
docker run ghcr.io/huggingface/text-generation-inference  --model-id checkpoint-400 --num-shard 2

It seems trying download the model rather than use local model there.

20230426181530

another reported issue #99

ghost commented 1 year ago

You can try mounting a directory from the host which contains the model, and then using --model-id with the complete path within the container. This worked for me:

$ pwd
/mnt/huggingface
$ ls
opt-125m  version.txt
$ ls opt-125m/
config.json  generation_config.json  merges.txt  pytorch_model.bin  special_tokens_map.json  tokenizer_config.json  vocab.json

$ docker run --gpus all --shm-size 1g -p 8080:80 --volume /mnt/huggingface:/data ghcr.io/huggingface/text-generation-inference:latest --model-id /data/opt-125m --num-shard 1

paulcx commented 1 year ago

Thanks for your reply. I did try your approch but it does not work?

$ ls huggingface/checkpoint-400/
config.json  generation_config.json  pytorch_model.bin  special_tokens_map.json  tokenizer.json  tokenizer_config.json

docker run --gpus all --shm-size 1g -p 8080:80 -v huggingface:/data text-generation-inference:latest --model-id /data/checkpoint-400 --num-shard 2

20230427080041

paulcx commented 1 year ago

You can try mounting a directory from the host which contains the model, and then using --model-id with the complete path within the container. This worked for me:
$ pwd
/mnt/huggingface
$ ls
opt-125m  version.txt
$ ls opt-125m/
config.json  generation_config.json  merges.txt  pytorch_model.bin  special_tokens_map.json  tokenizer_config.json  vocab.json

$ docker run --gpus all --shm-size 1g -p 8080:80 --volume /mnt/huggingface:/data ghcr.io/huggingface/text-generation-inference:latest --model-id /data/opt-125m --num-shard 1

After several attempts, I found that the problem lies in the mapping path in docker run.

Now I can confirm that your approach works!

docker run --gpus all --shm-size 1g -p 8080:80 -v /root/huggingface:/data text-generation-inference:latest --model-id /data/checkpoint-400 --num-shard 2

However, the another error came out:

20230427082335

paulcx commented 1 year ago

It works if I set --num-shard to 1. Then the question is what does the num-shard param mean here?

OlivierDehaene commented 1 year ago

num-shard means GPU parallelization using tensor parallelism. This is needed for larger models as they don't fit inside a single device. However, for tensor parallelism to work, you need your embeddings and other linear layers inside the models to be dividable by the number of shards.

gamepad-coder commented 7 months ago

--num-shard and other arguments for TGI ("Text Generation Inference") are documented here:

https://huggingface.co/docs/text-generation-inference/en/basic_tutorials/launcher#numshard