Open Asunatan opened 6 days ago
I found the same problem as you mention. So I directly edit the code: cur_text.append({ "role": "assistant",
# {"type": "text", "text": text},
# ]
"content": text
})
I found the same problem as you mention. So I directly edit the code: cur_text.append({ "role": "assistant", # "content": [ # {"type": "text", "text": text}, # ] "content": text })
Yes, I have adopted the same strategy as you, but is it correct to do so?
I think it is ok to edit like this, I have trained model using this repository (edit this few lines of code) and inference with qwen2vl official code (using apply_chat_template) and get the correct result.
I think it is ok to edit like this, I have trained model using this repository (edit this few lines of code) and inference with qwen2vl official code (using apply_chat_template) and get the correct result.
Thank you, this is very helpful to me.
@Asunatan @Elenore1997 Yes I think the proposed fix looks good. Sorry for not being able to respond earlier; I was moving the past few days. Tagging @linyueqian to be aware of this.
Yes, as mentioned in #56, we should directly use the text as the content value. Just updated in the latest commit.
Hello, I am a beginner in the field of VLM and have a question regarding the training template issue. In the Qwen2VLDataCollator you provided, I noticed there are some additional fields. This differs from directly applying
apply_chat_template_text= self.processor.apply_chat_template(cur_text, tokenize=False, add_generation_prompt=True,)
which seems to result in some differences. Could this lead to discrepancies during prediction? Below is the result obtained directly from applying apply_chat_template: The gpt_response obtained from apply_chat_template appears to lack the fields such as [ {"type":"text", "text":. I found that the source of the issue seems to be: I am curious whether the differences between these two could lead to training biases.