Deploy error for Llama-3.2-vision-11B: "Sharded is not supported for AutoModel"

xuan1905 commented 1 month ago

System Info

Hi Team, When deploying the model on AWS with huggingface-pytorch-tgi-inference:2.3.0-tgi2.2.0, I got the above error. Could you tell me when can TGI provide the new image? Is there any way I can work around the issue for the moment?

Information

[X] Docker
[ ] The CLI directly

Tasks

[ ] An officially supported command
[ ] My own modifications

Reproduction

Run the image huggingface-pytorch-tgi-inference:2.3.0-tgi2.2.0 on Sagemaker.

Expected behavior

TGI can deploy the Llama3.2 model successfully

dossjjx commented 1 month ago

Same issue here with the 90B model. Number of shards: 4.

xuan1905 commented 1 month ago

Is there any update?

renambot commented 1 month ago

TGI v2.3.1 works with llama 3.2 Vision now (mllama models)

xuan1905 commented 1 month ago

Great. Thanks. Is it available in AWS deep learning container images?

huggingface / text-generation-inference