huggingface / text-generation-inference

Large Language Model Text Generation Inference
http://hf.co/docs/text-generation-inference
Apache License 2.0
8.78k stars 1.02k forks source link

404 for Multi-modal docs #1853

Closed RonanKMcGovern closed 3 months ago

RonanKMcGovern commented 4 months ago

System Info

NA

Information

Tasks

Reproduction

It is unclear how to query TGI for multi-modal models.

The links to LLaVA Next and IDEFICS2 give 404:

https://huggingface.co/docs/text-generation-inference/HuggingFaceM4/idefics-9b-instruct

https://huggingface.co/docs/text-generation-inference/llava-hf/llava-v1.6-mistral-7b-hf

@Narsil @VictorSanh

Expected behavior

When querying transformers, an <image> placeholder is used and the images are passed as a separate input argument to the prompt. This doesn't appear to be the case with TGI, which just expects a prompt input.

Something like this:

curl https://yd64jhjr8ylu54-8080.proxy.runpod.net/generate \
    -X POST \
    -d '{"inputs": "User: ![](http://images.cocodataset.org/val2017/000000219578.jpg)Tell me about this image<end_of_utterance>\\nAssistant:","parameters":{"max_new_tokens":20}}' \
    -H 'Content-Type: application/json'

works, although it fails when trying to do two images (the model ignores the second image):

curl https://yd64jhjr8ylu54-8080.proxy.runpod.net/generate \
    -X POST \
    -d '{"inputs": "User: ![](http://images.cocodataset.org/val2017/000000219578.jpg)Tell me about this image, and also about this second image: ![](http://images.cocodataset.org/val2017/000000039769.jpg)<end_of_utterance>\\nAssistant:","parameters":{"max_new_tokens":50}}' \
    -H 'Content-Type: application/json'
github-actions[bot] commented 3 months ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.