Vision (Auto)Processor multiple images finetuning example.

huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

https://huggingface.co/transformers

Apache License 2.0

135.01k stars 27.01k forks source link

Vision (Auto)Processor multiple images finetuning example. #34489

Open lovodkin93 opened 2 weeks ago

lovodkin93 commented 2 weeks ago

Feature request

Is it possible to upload an example of how to finetune PaLIGemma on multi-image inputs? Something similar to multi-image-inference, which shows how to perform multi-image inference over PaLIGemma.

Motivation

enabling finetuning of multi-image PaLIGemma

Your contribution

Rocketknight1 commented 2 weeks ago

cc @qubvel

qubvel commented 2 weeks ago

Hi @lovodkin93, you can probably modify this example for multi images finetuning. cc @merveenoyan

merveenoyan commented 1 week ago

@qubvel if you're not working on it I can take a stab

qubvel commented 1 week ago

@merveenoyan Sure, go ahead! I'm not currently working on it, so feel free to take a stab.