huggingface / chat-ui

Open source codebase powering the HuggingChat app
https://huggingface.co/chat
Apache License 2.0
7.17k stars 1.03k forks source link

IDEFICS works with image URLs but not "upload image" #715

Open metemadi opened 7 months ago

metemadi commented 7 months ago

Hi there,

I am hosting IDEFICS on TGI locally and hitting it via Chat-UI. When I go to a URL, it is able to correctly work with images, as seen here:

image

However, when I use the "upload image" feature, or the drag and drop, it looks like it is ignoring the image:

image

Here is my model config (using the latest main branch of chat-ui 0.7.0): { "name": "HuggingFaceM4/idefics-9b-instruct", "endpoints":[{"type":"tgi","url":"http://INTERNAL URL:/generate_stream"}], "multimodal" : true, "description": "IDEFICS is the new multimodal model by Hugging Face.", "preprompt": "", "chatPromptTemplate" : "{{#each messages}}{{#ifUser}}User: {{content}}{{/ifUser}}<end_of_utterance>\nAssistant: {{#ifAssistant}}{{content}}\n{{/ifAssistant}}{{/each}}", "parameters": { "temperature": 0.1, "top_p": 0.95, "repetition_penalty": 1.2, "top_k": 12, "truncate": 1000, "max_new_tokens": 1024, "stop": ["<end_of_utterance>", "User:", "\nUser:"] } }

And here is my TGI command args from docker compose (using 1.3.4 container for TGI): --sharded true --num-shard 2 --dtype float16 --model-id HuggingFaceM4/idefics-9b-instruct --max-total-tokens 2048 --max-input-length 1000 --max-batch-prefill-tokens 2048

Here is the image URL for reference.

A gigantic thank you for putting together such wonderful tools!! And thank you in advance for your help.

nsarrazin commented 7 months ago

Thank you for the report! Will try to take a look later.

metemadi commented 7 months ago

Thank you! I found the issue - the fix is here for a similar IDEFICS (larger one) model: https://huggingface.co/HuggingFaceM4/idefics-80b-instruct/discussions/10/files

I think the base64 encoded image is max'ing the sequence length - when you add the regex to truncate the image portion, it seems to work perfectly! I wonder if this should also be done in the URL case as well (the URL case it appears keeps the URL in the context of the model which I am guessing is not the intent)?

Not sure the best place to apply this fix - is it in a particular model version? In TGI? etc. I am doing it in a really hacky way where I run my dockers, stop them, go into the TGI one's storage mount, modify the tokenizer json file as per the above, and then run them again.. messy but it works!

Thank you again for such an amazing set of products!

jalalirs commented 2 weeks ago

I am experiencing the same error. But when I apply your solution below I get another error on the TGI related to the shape mismatch as in the thread below.

https://github.com/huggingface/transformers/issues/31380

I am using the latest TGI with HuggingFaceM4/idefics2-8b

Any clue?

Thank you! I found the issue - the fix is here for a similar IDEFICS (larger one) model: https://huggingface.co/HuggingFaceM4/idefics-80b-instruct/discussions/10/files

I think the base64 encoded image is max'ing the sequence length - when you add the regex to truncate the image portion, it seems to work perfectly! I wonder if this should also be done in the URL case as well (the URL case it appears keeps the URL in the context of the model which I am guessing is not the intent)?

Not sure the best place to apply this fix - is it in a particular model version? In TGI? etc. I am doing it in a really hacky way where I run my dockers, stop them, go into the TGI one's storage mount, modify the tokenizer json file as per the above, and then run them again.. messy but it works!

Thank you again for such an amazing set of products!

metemadi commented 2 weeks ago

Hi! Sorry I am really not sure - definitely not an expert on this stuff. The one thing is I can't remember if I turned "normalization" on or if I just did the regex thing - maybe thats it? (It was a while ago). Separately though the shape thing seems odd to me and perhaps unrelated? Sorry I can't be of more help but perhaps others will reply!