Closed waltonfuture closed 3 hours ago
The major difference is that DPO uses paired positive and negative data for finetuning.
By paired, I mean a pair of negative and positive responses from the same question. However, EFUF does not need paired response. It just needs negative or positive response from any question. Therefore, EFUF allows for easier and more efficient data collecting.
Besides, EFUF utilizes a fine-grained approach, contributing to better performance.
Thank you. I got it. Have you tried efuf in LLM settings besides MLLM? It sounds interesting if efuf can have more benefits than DPO for LLMs.
Not yet. It's more natural to use CLIP to indicate multimodal hallucinations, but quite difficult to find any reliable external source to determine hallucinations in LLM, since the latter is related more with knowledge. As you know, things get really complicated if associated with knowledge.
Message ID: @.***>
Thank you for your response.
Both efuf and DPO use positive and negative data for learning. What are the advantages of efuf over DPO? Thanks.