Does DPO/RLHF tuning support internVL2 video models?

modelscope / ms-swift

Use PEFT or Full-parameter to finetune 400+ LLMs or 100+ MLLMs. (LLM: Qwen2.5, Llama3.2, GLM4, Internlm2.5, Yi1.5, Mistral, Baichuan2, DeepSeek, Gemma2, ...; MLLM: Qwen2-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL2, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL, Phi3.5-Vision, ...)

https://swift.readthedocs.io/zh-cn/latest/Instruction/index.html

Apache License 2.0

4.27k stars 377 forks source link

Does DPO/RLHF tuning support internVL2 video models? #2015

Closed BillChan226 closed 2 months ago

BillChan226 commented 2 months ago

Hi, thanks for the great work! I'm wondering if DPO is supported for tuning internvl2 with video input? Thanks!

Jintao-Huang commented 2 months ago

https://github.com/modelscope/ms-swift/pull/1975

Jintao-Huang commented 2 months ago

Already supported

BillChan226 commented 2 months ago

However, it seems that _patch_internvl_forward(forward_func) in swift/swift/llm/utils/model.py have pixel_values = None even if I provide an videos key in each entry of the dpo dataset. I think in this case the video input is not being processed?