Closed kishan-character closed 1 month ago
@kishan-character we have recently merged a bunch of changes to transformers to fix the fine-tuning. the updated notebook is here: https://colab.research.google.com/drive/1x_OEphRK0H97DqqxEyiMewqsTiLD_Xmi?usp=sharing I will merge the changes to blog as well soon
Thanks @merveenoyan It seems data and model loading passed.
I am facing issues with device placement, while running train this notebook
RuntimeError Traceback (most recent call last)
[<ipython-input-86-3435b262f1ae>](https://localhost:8080/#) in <cell line: 1>()
----> 1 trainer.train()
39 frames
[/usr/local/lib/python3.10/dist-packages/transformers/models/paligemma/modeling_paligemma.py](https://localhost:8080/#) in _merge_input_ids_with_image_features(self, image_features, inputs_embeds, input_ids, attention_mask, labels, token_type_ids, cache_position)
308 final_embedding = torch.where(pad_mask_expanded, torch.zeros_like(final_embedding), final_embedding)
309 # insert image embeddings - the image mask is always less or equal to the sentence in length
--> 310 final_embedding = final_embedding.masked_scatter(
311 image_mask.unsqueeze(-1).expand_as(final_embedding), scaled_image_features
312 )
RuntimeError: masked_scatter_: expected self and source to have same dtypes but got BFloat16 and Float
by changing bf16=True
to bf16=False
has no impact on the error. It seems the collate_fn
definition is correct to me.
I realized that there was a pull request on transformer source https://github.com/huggingface/transformers/pull/31008
.
The notebook works fine when I use pip install -q -U git+https://github.com/huggingface/transformers.git@paligemma_fix_bf16_multigpu
.
Thanks Maulik
@maulikmadhavi You are right, the PR you mention fixes this issue. It has just been merged, closing this now. Thank you!
HuggingFaceM4/VQAv2
. Usinglmms-lab/VQAv2
helps fix this issue. I don't think the dataset was set up correctlyds = load_dataset('lmms-lab/VQAv2', split="validation", token=huggingface_token)
processor = PaliGemmaProcessor(model_id)
does not work, it expects image_processor and tokenizer as argumentsMaybe something like (I'm not sure what
image_seq_length
should be set to):https://github.com/huggingface/blog/blob/main/paligemma.md