Open DarioPTWR opened 1 week ago
We've have an example script to train VLM with DPO here. Have you tried to run it with MiniCPM-V? At present, we're not claiming that you can use it with any VLM, as the level of standardization of VLMs is lower than that of LLMs. But it's definitely worth giving this one a try.
Alright cool! Will try it out and provide an update, thanks for your response!
Feature request
Hi! I’d like to request support for reinforcement learning with DPO for the MiniCPM-V model. I'm not sure if the current state of this repository enables for this vision model to be retrained as well, could I get some advice / insights into that? Would the current approach for applying DPO to VLMs work for the majority of VLMs on HuggingFace?
Motivation
None
Your contribution
None