PaliGemma Finetuning Instructions don't work

kishan-character commented 1 month ago

There are issues when trying to load the dataset HuggingFaceM4/VQAv2. Using lmms-lab/VQAv2 helps fix this issue. I don't think the dataset was set up correctly ds = load_dataset('lmms-lab/VQAv2', split="validation", token=huggingface_token)
processor = PaliGemmaProcessor(model_id) does not work, it expects image_processor and tokenizer as arguments

Maybe something like (I'm not sure what image_seq_length should be set to):

processor = PaliGemmaProcessor(
    image_processor=SiglipImageProcessor(image_seq_length=224),
    tokenizer=GemmaTokenizer.from_pretrained(model_id)
)

https://github.com/huggingface/blog/blob/main/paligemma.md

merveenoyan commented 1 month ago

@kishan-character we have recently merged a bunch of changes to transformers to fix the fine-tuning. the updated notebook is here: https://colab.research.google.com/drive/1x_OEphRK0H97DqqxEyiMewqsTiLD_Xmi?usp=sharing I will merge the changes to blog as well soon

maulikmadhavi commented 1 month ago

Thanks @merveenoyan It seems data and model loading passed.

I am facing issues with device placement, while running train this notebook

RuntimeError                              Traceback (most recent call last)

[<ipython-input-86-3435b262f1ae>](https://localhost:8080/#) in <cell line: 1>()
----> 1 trainer.train()

39 frames

[/usr/local/lib/python3.10/dist-packages/transformers/models/paligemma/modeling_paligemma.py](https://localhost:8080/#) in _merge_input_ids_with_image_features(self, image_features, inputs_embeds, input_ids, attention_mask, labels, token_type_ids, cache_position)
    308         final_embedding = torch.where(pad_mask_expanded, torch.zeros_like(final_embedding), final_embedding)
    309         # insert image embeddings - the image mask is always less or equal to the sentence in length
--> 310         final_embedding = final_embedding.masked_scatter(
    311             image_mask.unsqueeze(-1).expand_as(final_embedding), scaled_image_features
    312         )

RuntimeError: masked_scatter_: expected self and source to have same dtypes but got BFloat16 and Float

by changing bf16=True to bf16=False has no impact on the error. It seems the collate_fn definition is correct to me.

I realized that there was a pull request on transformer source https://github.com/huggingface/transformers/pull/31008. The notebook works fine when I use pip install -q -U git+https://github.com/huggingface/transformers.git@paligemma_fix_bf16_multigpu.

Thanks Maulik

pcuenca commented 1 month ago

@maulikmadhavi You are right, the PR you mention fixes this issue. It has just been merged, closing this now. Thank you!

huggingface / blog

PaliGemma Finetuning Instructions don't work #2090