zjysteven / lmms-finetune

A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, llava-onevision, qwen-vl, qwen2-vl, phi3-v etc.

Apache License 2.0

192 stars 23 forks source link

Regarding the issue of training templates in Qwen2VLDataCollator #57

Open Asunatan opened 6 days ago

Asunatan commented 6 days ago

Hello, I am a beginner in the field of VLM and have a question regarding the training template issue. In the Qwen2VLDataCollator you provided, I noticed there are some additional fields. This differs from directly applying apply_chat_template_text= self.processor.apply_chat_template(cur_text, tokenize=False, add_generation_prompt=True,)which seems to result in some differences. Could this lead to discrepancies during prediction? Below is the result obtained directly from applying apply_chat_template: The gpt_response obtained from apply_chat_template appears to lack the fields such as [ {"type":"text", "text":. I found that the source of the issue seems to be: I am curious whether the differences between these two could lead to training biases.

Elenore1997 commented 1 day ago

I found the same problem as you mention. So I directly edit the code: cur_text.append({ "role": "assistant",

"content": [

  #     {"type": "text", "text": text},
  # ]
  "content": text

})

Asunatan commented 1 day ago

I found the same problem as you mention. So I directly edit the code: cur_text.append({ "role": "assistant", # "content": [ # {"type": "text", "text": text}, # ] "content": text })

Yes, I have adopted the same strategy as you, but is it correct to do so?

Elenore1997 commented 1 day ago

I think it is ok to edit like this, I have trained model using this repository (edit this few lines of code) and inference with qwen2vl official code (using apply_chat_template) and get the correct result.

Asunatan commented 1 day ago

I think it is ok to edit like this, I have trained model using this repository (edit this few lines of code) and inference with qwen2vl official code (using apply_chat_template) and get the correct result.

Thank you, this is very helpful to me.

zjysteven commented 21 hours ago

@Asunatan @Elenore1997 Yes I think the proposed fix looks good. Sorry for not being able to respond earlier; I was moving the past few days. Tagging @linyueqian to be aware of this.

linyueqian commented 19 hours ago

Yes, as mentioned in #56, we should directly use the text as the content value. Just updated in the latest commit.