关于chatglm2与chatglm数据格式的问题

liucongg / ChatGLM-Finetuning

基于ChatGLM-6B、ChatGLM2-6B、ChatGLM3-6B模型，进行下游具体任务微调，涉及Freeze、Lora、P-tuning、全参微调等

2.66k stars 294 forks source link

关于chatglm2与chatglm数据格式的问题 #104

Open Kayce001 opened 1 year ago

Kayce001 commented 1 year ago

input_ids = [tokenizer.get_command("[gMASK]"), tokenizer.get_command("sop")] + tokenizer.convert_tokens_to_ids(tokens)请问这行是什么意思，为什么和chatglm版本差别挺大的，为什么可以以现在这种格式写呢？

zengzhongjie commented 11 months ago

我也有这个疑问，按这个格式，我们试用效果很差

liucongg commented 10 months ago

因为chatglm2和chatglm官方在训练的时候，用的数据格式就不同。PS：两个模型的结构模型也大不相同。一个是prefix-lm一个是causal-lm