huggingface / text-generation-inference

Large Language Model Text Generation Inference
http://hf.co/docs/text-generation-inference
Apache License 2.0
9.12k stars 1.07k forks source link

Deploy error for Llama-3.2-vision-11B: "Sharded is not supported for AutoModel" #2571

Open xuan1905 opened 1 month ago

xuan1905 commented 1 month ago

System Info

Hi Team, When deploying the model on AWS with huggingface-pytorch-tgi-inference:2.3.0-tgi2.2.0, I got the above error. Could you tell me when can TGI provide the new image? Is there any way I can work around the issue for the moment?

Information

Tasks

Reproduction

Run the image huggingface-pytorch-tgi-inference:2.3.0-tgi2.2.0 on Sagemaker.

Expected behavior

TGI can deploy the Llama3.2 model successfully

dossjjx commented 1 month ago

Same issue here with the 90B model. Number of shards: 4.

xuan1905 commented 1 month ago

Is there any update?

renambot commented 1 month ago

TGI v2.3.1 works with llama 3.2 Vision now (mllama models)

xuan1905 commented 1 month ago

Great. Thanks. Is it available in AWS deep learning container images?