Open WeixuanXiong opened 1 week ago
It should work now. The trick is to use the pre-query template in the tokenizer config.
For the Qwen 2.5 family, it should be:
<|im_start|>system\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\n<|im_start|>user\n
By the way, I found that the 7B model doesn't work so well, but 3B works great. You can use 3B for now.
Thanks!
I found that training chat model which is, for instance, qwen2-instruct harm the model's capability such as instruction following and etc. So it could only work well on base model when using magpie data?
If you continue aligning a chat model, then you should be careful about distribution shift, which might do harm to the model performance. But ideally, if you use Qwen2.5-7B Ins's response to fine-tune Qwen2.5-7B Ins, it should be fine...
I've tried using your Magpie_Qwen2_Pro_200K_Chinese_training datasets which i believe is generated from qwen2 72b chat? I think the alignment data used on 72B model and 7B model is the same or majorly overlapped. The distribution between datasets from 7B model and datasets from 72B model may not be that huge? If i'm wrong, please let me know.
thanks~
用qwen2.5 7B模型生成数据的时候发现生成的instruction大部分都是这种没头没尾的文本片段。请问是否别的模型也有这个问题呢?如果有怎么解决它呢?
感谢!