404 for Multi-modal docs

System Info

Information

[ ] Docker
[ ] The CLI directly

Tasks

[X] An officially supported command
[ ] My own modifications

Reproduction

It is unclear how to query TGI for multi-modal models.

The links to LLaVA Next and IDEFICS2 give 404:

https://huggingface.co/docs/text-generation-inference/HuggingFaceM4/idefics-9b-instruct

https://huggingface.co/docs/text-generation-inference/llava-hf/llava-v1.6-mistral-7b-hf

@Narsil @VictorSanh

Expected behavior

When querying transformers, an <image> placeholder is used and the images are passed as a separate input argument to the prompt. This doesn't appear to be the case with TGI, which just expects a prompt input.

Something like this:

curl https://yd64jhjr8ylu54-8080.proxy.runpod.net/generate \
    -X POST \
    -d '{"inputs": "User: ![](http://images.cocodataset.org/val2017/000000219578.jpg)Tell me about this image<end_of_utterance>\\nAssistant:","parameters":{"max_new_tokens":20}}' \
    -H 'Content-Type: application/json'

works, although it fails when trying to do two images (the model ignores the second image):

curl https://yd64jhjr8ylu54-8080.proxy.runpod.net/generate \
    -X POST \
    -d '{"inputs": "User: ![](http://images.cocodataset.org/val2017/000000219578.jpg)Tell me about this image, and also about this second image: ![](http://images.cocodataset.org/val2017/000000039769.jpg)<end_of_utterance>\\nAssistant:","parameters":{"max_new_tokens":50}}' \
    -H 'Content-Type: application/json'

huggingface / text-generation-inference

404 for Multi-modal docs #1853

System Info

Information

Tasks

Reproduction

Expected behavior