huggingface / text-generation-inference

Large Language Model Text Generation Inference
http://hf.co/docs/text-generation-inference
Apache License 2.0
8.77k stars 1.02k forks source link

Add support for Idefics 3 #2503

Open stelterlab opened 1 week ago

stelterlab commented 1 week ago

Model description

Please add support for HuggingFaceM4/Idefics3-8B-Llama3 in tgi:

Idefics3 is an open multimodal model that accepts arbitrary sequences of image and text inputs and produces text outputs. The model can answer questions about images, describe visual content, create stories grounded on multiple images, or simply behave as a pure language model without visual inputs.

Open source status

Provide useful links for the implementation

Well, the necessary changes for the transformers library are just waiting for a review for the PR:

https://github.com/huggingface/transformers/pull/32473

as the time of writing this model request.

As model/finetune and transformers lib is made by the same famous company I would assume there should be no big problems. ;-)

ErikKaum commented 1 week ago

Hi @stelterlab 👋

We have a PR in the making, no big problems indeed ;) but we are a bit constrained on bandwidth at the moment, so it's not moving as fast as we'd like

efenocchi commented 1 week ago

Hi @ErikKaum I noticed a bug, could you check my last comment in the PR?

ErikKaum commented 1 week ago

Hi @efenocchi I'm unfortunately not super well versed in the transformers library. I'd consider reaching out to the people in the conversation you have in the repo 👍