Does it support fine-tuning the PaliGemma model for object detection and segmentation?

hiyouga / LLaMA-Factory

Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)

https://arxiv.org/abs/2403.13372

Apache License 2.0

35.16k stars 4.35k forks source link

Does it support fine-tuning the PaliGemma model for object detection and segmentation? #6147

Closed WangRongsheng closed 3 days ago

WangRongsheng commented 3 days ago

In the paper about PaliGemma, it is indicated that it supports tasks such as Image Captioning, Visual Question Answering, Detection, and Referring Expression Segmentation.

Can Llama-Factory support fine-tuning for these tasks?

hiyouga commented 3 days ago

You can refer to this paper https://arxiv.org/pdf/2306.15195 to construct your data in llama-factory's format.