model.generate() errors in Idefics2 Fine-tuning notebook

tctrautman commented 2 months ago

System Info

The environment in the Idefics2 - Fine-tuning colab

Who can help?

@amyeroberts @zucchini-nlp

Information

[X] The official example scripts
[ ] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

Open the Idefics2 - Fine-tuning tutorial notebook
Run all cells up to and including the cell that loads datasets
Scroll down to the evaluation cell (right before importing Levenshtein) & run it.
Observe the following error:

/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py:1790: UserWarning: You are calling .generate() with the `input_ids` being on a device type different than your model's device. `input_ids` is on cpu, whereas the model is on cuda. You may experience unexpected behaviors or slower generation. Please make sure that you have put `input_ids` to the correct device by calling for example input_ids = input_ids.to('cuda') before running `.generate()`.
  warnings.warn(
The `seen_tokens` attribute is deprecated and will be removed in v4.41. Use the `cache_position` model input instead.
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
[<ipython-input-9-787936dbb363>](https://localhost:8080/#) in <cell line: 18>()
     16 text = processor.apply_chat_template(messages, add_generation_prompt=True)
     17 inputs = processor(text=[text.strip()], images=[image], return_tensors="pt", padding=True)
---> 18 generated_ids = model.generate(**inputs, max_new_tokens=64)
     19 generated_texts = processor.batch_decode(generated_ids[:, inputs["input_ids"].size(1):], skip_special_tokens=True)
     20 print(generated_texts)

11 frames
[/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py](https://localhost:8080/#) in decorate_context(*args, **kwargs)
    113     def decorate_context(*args, **kwargs):
    114         with ctx_factory():
--> 115             return func(*args, **kwargs)
    116 
    117     return decorate_context

[/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py](https://localhost:8080/#) in generate(self, inputs, generation_config, logits_processor, stopping_criteria, prefix_allowed_tokens_fn, synced_gpus, assistant_model, streamer, negative_prompt_ids, negative_prompt_attention_mask, **kwargs)
   1894 
   1895             # 13. run sample (it degenerates to greedy search when `generation_config.do_sample=False`)
-> 1896             result = self._sample(
   1897                 input_ids,
   1898                 logits_processor=prepared_logits_processor,

[/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py](https://localhost:8080/#) in _sample(self, input_ids, logits_processor, stopping_criteria, generation_config, synced_gpus, streamer, logits_warper, **model_kwargs)
   2631 
   2632             # forward pass to get next token
-> 2633             outputs = self(
   2634                 **model_inputs,
   2635                 return_dict=True,

[/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py](https://localhost:8080/#) in _wrapped_call_impl(self, *args, **kwargs)
   1530             return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1531         else:
-> 1532             return self._call_impl(*args, **kwargs)
   1533 
   1534     def _call_impl(self, *args, **kwargs):

[/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py](https://localhost:8080/#) in _call_impl(self, *args, **kwargs)
   1539                 or _global_backward_pre_hooks or _global_backward_hooks
   1540                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1541             return forward_call(*args, **kwargs)
   1542 
   1543         try:

[/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py](https://localhost:8080/#) in new_forward(module, *args, **kwargs)
    164                 output = module._old_forward(*args, **kwargs)
    165         else:
--> 166             output = module._old_forward(*args, **kwargs)
    167         return module._hf_hook.post_forward(module, output)
    168 

[/usr/local/lib/python3.10/dist-packages/transformers/models/idefics2/modeling_idefics2.py](https://localhost:8080/#) in forward(self, input_ids, attention_mask, position_ids, past_key_values, inputs_embeds, pixel_values, pixel_attention_mask, image_hidden_states, labels, use_cache, output_attentions, output_hidden_states, return_dict)
   1827 
   1828         # decoder outputs consists of (dec_features, layer_state, dec_hidden, dec_attn)
-> 1829         outputs = self.model(
   1830             input_ids=input_ids,
   1831             attention_mask=attention_mask,

[/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py](https://localhost:8080/#) in _wrapped_call_impl(self, *args, **kwargs)
   1530             return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1531         else:
-> 1532             return self._call_impl(*args, **kwargs)
   1533 
   1534     def _call_impl(self, *args, **kwargs):

[/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py](https://localhost:8080/#) in _call_impl(self, *args, **kwargs)
   1539                 or _global_backward_pre_hooks or _global_backward_hooks
   1540                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1541             return forward_call(*args, **kwargs)
   1542 
   1543         try:

[/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py](https://localhost:8080/#) in new_forward(module, *args, **kwargs)
    164                 output = module._old_forward(*args, **kwargs)
    165         else:
--> 166             output = module._old_forward(*args, **kwargs)
    167         return module._hf_hook.post_forward(module, output)
    168 

[/usr/local/lib/python3.10/dist-packages/transformers/models/idefics2/modeling_idefics2.py](https://localhost:8080/#) in forward(self, input_ids, attention_mask, position_ids, past_key_values, inputs_embeds, pixel_values, pixel_attention_mask, image_hidden_states, use_cache, output_attentions, output_hidden_states, return_dict)
   1654             # When we generate, we don't want to replace the potential image_token_id that we generated by images
   1655             # that simply don't exist
-> 1656             inputs_embeds = self.inputs_merger(
   1657                 input_ids=input_ids,
   1658                 inputs_embeds=inputs_embeds,

[/usr/local/lib/python3.10/dist-packages/transformers/models/idefics2/modeling_idefics2.py](https://localhost:8080/#) in inputs_merger(self, input_ids, inputs_embeds, image_hidden_states)
   1540         new_inputs_embeds = inputs_embeds.clone()
   1541         reshaped_image_hidden_states = image_hidden_states.view(-1, vision_hidden_size)
-> 1542         new_inputs_embeds[special_image_token_mask] = reshaped_image_hidden_states
   1543         return new_inputs_embeds
   1544 

RuntimeError: shape mismatch: value tensor of shape [64, 4096] cannot be broadcast to indexing result of shape [0, 4096]

Expected behavior

model.generate(...) should run without throwing RuntimeError: shape mismatch: value tensor of shape [64, 4096] cannot be broadcast to indexing result of shape [0, 4096]

zucchini-nlp commented 2 months ago

@tctrautman Right, there was a bug in Idefics and I opened a PR to fix it.

Please update transformers after it is merged with: !pip install --upgrade git+https://github.com/huggingface/transformers.git

EricLBuehler commented 2 months ago

I just ran git clone and installed with pip install -e .. I can reproduce the error, is there a solution?

zucchini-nlp commented 2 months ago

@EricLBuehler the PR is not merged yet, the error will be fixed as soon as it's merged.

A workround can be to use_cache=False in generate(), but that will result in a very slow generation so I'd recommend to wait for the PR merge :)

EricLBuehler commented 2 months ago

Ah, ok. I just opened #31380, so I'll close it when #31377 is merged.

tctrautman commented 2 months ago

Thank you, @zucchini-nlp!

huggingface / transformers