Closed WangRongsheng closed 3 days ago
In the paper about PaliGemma, it is indicated that it supports tasks such as Image Captioning, Visual Question Answering, Detection, and Referring Expression Segmentation.
Can Llama-Factory support fine-tuning for these tasks?
You can refer to this paper https://arxiv.org/pdf/2306.15195 to construct your data in llama-factory's format.
In the paper about PaliGemma, it is indicated that it supports tasks such as Image Captioning, Visual Question Answering, Detection, and Referring Expression Segmentation.
Can Llama-Factory support fine-tuning for these tasks?