huggingface / text-generation-inference

Large Language Model Text Generation Inference
http://hf.co/docs/text-generation-inference
Apache License 2.0
9k stars 1.06k forks source link

Multi-modal model support #1669

Closed RonanKMcGovern closed 6 months ago

RonanKMcGovern commented 7 months ago

Feature request

Increase support for multi-modal models going forward. Llava 1.6 is one option, but waiting for whatever best model comes out next (IDEFICS 2?) would be fine too.

Motivation

Inference API support for multi-modal models is much weaker than for LLMs when it comes to open source. It's hard for open source developers to fine-tune multi-modal but it's even harder to do inference at even a small production level (e.g. Llava 1.6 is supported by SGLang, which is fine but more obscure than TGI or vLLM).

Your contribution

If someone has visibility on models coming out, it would be great to align with those teams and try to get one or two models supported (FWIW the original IDEFICS works with TGI AFAIK, but it's outdated in performance).

github-actions[bot] commented 6 months ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.