huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
135.05k stars 27.02k forks source link

[Idefics3] processing_idefics3 - IndexError: list index out of range for multiple image input #34727

Open Glider95 opened 2 days ago

Glider95 commented 2 days ago

System Info

Who can help?

@amyeroberts , @quvb

Information

Tasks

Reproduction

Code to reproduce:

from PIL import Image img1=Image.open('Image1.JPG') img2=Image.open('Image2.JPG')

prompt = processor.apply_chat_template(messages, add_generation_prompt=True) inputs = processor(text=prompt, images=[img1,img2], return_tensors="pt") inputs = {k: v.to(DEVICE) for k, v in inputs.items()}

Generate

generated_ids = model.generate(**inputs, max_new_tokens=512) generated_texts = processor.batch_decode(generated_ids, skip_special_tokens=True)

print(generated_texts)


IndexError Traceback (most recent call last) Cell In[4], line 6 3 img2=Image.open('Image2.JPG') 5 prompt = processor.apply_chat_template(messages, add_generation_prompt=True) ----> 6 inputs = processor(text=[prompt,prompt], images=[img1,img2], return_tensors="pt") 7 inputs = {k: v.to(DEVICE) for k, v in inputs.items()} 9 # Generate

File ~/envs/default/lib/python3.10/site-packages/transformers/models/idefics3/processing_idefics3.py:302, in Idefics3Processor.call(self, images, text, audio, videos, image_seq_len, kwargs) 300 sample = split_sample[0] 301 for i, image_prompt_string in enumerate(image_prompt_strings): --> 302 sample += image_prompt_string + split_sample[i + 1] 303 prompt_strings.append(sample) 305 text_inputs = self.tokenizer(text=prompt_strings, output_kwargs["text_kwargs"])

IndexError: list index out of range

Expected behavior

I would expect Model to take 2 images in the input and provide generation using these 2 images as context.

zucchini-nlp commented 1 day ago

Hey @Glider95 !

Seems like you are trying to pass images in a non-nested list which is expected to give an error for Idefics3. That behavior has been changed recently in https://github.com/huggingface/transformers/pull/34222, so you can either install transformers from source (!pip install --upgrade git+https://github.com/huggingface/transformers.git) or pass nested list of images where each list is one batch (processor(images=[[img], [img]], text=[prompt1, prompt2]))