Closed hongyix closed 1 year ago
没搞懂你写2遍干嘛?评估 预测时,python training_chatglm_adgen_demo.py --do_predict 就行。
没搞懂你写2遍干嘛?评估 预测时,python training_chatglm_adgen_demo.py --do_predict 就行。
上面评估两次,一次是训练完的模型接着predict。第二次是重新加载模型,现在的问题是两次评估的结果差异很大。大佬请问这是什么原因导致的?
python training_chatglm_adgen_demo.py --do_predict
试试;附上, 生成模型预测调参建议:
建议去调整下 top_p, num_beams, repetition_renalty, temperature, do_sample=True;
数据生成有重复,调高repetition_renalty;
csc这种任务不复杂,训练样本少,调低 temperature。 以上是经验参数,具体调参根据任务而定,不是固定的。
top_p=0.9,
temperature=1.0,
do_sample=True,
no_repeat_ngram_size=6,
repetition_penalty=1.8,
- 看case应该是第二次lora没有加载成功的效果,建议第二次执行用
python training_chatglm_adgen_demo.py --do_predict
试试;- 更新代码看下;
- 用示例数据测试下lora加载逻辑是否正常;
我的理解是模型回答不出来100条训练样本的格式是正常的,lora欠拟合,上面链接说的100条足够训练应该是基于ptuning的。现在问题是为啥train完predict的会出现那样的结果。
附上, 生成模型预测调参建议:
建议去调整下 top_p, num_beams, repetition_renalty, temperature, do_sample=True;
数据生成有重复,调高repetition_renalty;
csc这种任务不复杂,训练样本少,调低 temperature。 以上是经验参数,具体调参根据任务而定,不是固定的。
top_p=0.9, #Moderately increase the probability threshold of nucleus sampling to increase the quantity of candidate tokens and increase generation diversity.
temperature=1.0, #The previous low temperature parameter could lead to a severe polarization in the probability distribution of generated words, which degenerates the generation strategy into greedy decoding.
do_sample=True, #do_sample parameter is set to False by default. After setting to True, the generation methods turn into beam-search multinomial sampling decoding strategy.
no_repeat_ngram_size=6, #Configure the probability of the next repeating n-gram to 0, to ensure that there are no n-grams appearing twice. This setting is an empirical preliminary exploration.
repetition_penalty=1.8, #For words that have appeared before, in the subsequent prediction process, we reduce the probability of their reoccurrence by introducing the repetition_penalty parameter. This setting is an empirical preliminary exploration.
do_sample改成false了,两次预测也不一样
线上店铺名训练集.txt 这是使用的训练集,大佬你可以试试。
附上, 生成模型预测调参建议: 建议去调整下 top_p, num_beams, repetition_renalty, temperature, do_sample=True; 数据生成有重复,调高repetition_renalty; csc这种任务不复杂,训练样本少,调低 temperature。 以上是经验参数,具体调参根据任务而定,不是固定的。 top_p=0.9, #Moderately increase the probability threshold of nucleus sampling to increase the quantity of candidate tokens and increase generation diversity. temperature=1.0, #The previous low temperature parameter could lead to a severe polarization in the probability distribution of generated words, which degenerates the generation strategy into greedy decoding. do_sample=True, #do_sample parameter is set to False by default. After setting to True, the generation methods turn into beam-search multinomial sampling decoding strategy. no_repeat_ngram_size=6, #Configure the probability of the next repeating n-gram to 0, to ensure that there are no n-grams appearing twice. This setting is an empirical preliminary exploration. repetition_penalty=1.8, #For words that have appeared before, in the subsequent prediction process, we reduce the probability of their reoccurrence by introducing the repetition_penalty parameter. This setting is an empirical preliminary exploration.
do_sample改成false了,两次预测也不一样
想一样,可以把 adapter_config.json 里的 inference_mode改为false
附上, 生成模型预测调参建议: 建议去调整下 top_p, num_beams, repetition_renalty, temperature, do_sample=True; 数据生成有重复,调高repetition_renalty; csc这种任务不复杂,训练样本少,调低 temperature。 以上是经验参数,具体调参根据任务而定,不是固定的。 top_p=0.9, #Moderately increase the probability threshold of nucleus sampling to increase the quantity of candidate tokens and increase generation diversity. temperature=1.0, #The previous low temperature parameter could lead to a severe polarization in the probability distribution of generated words, which degenerates the generation strategy into greedy decoding. do_sample=True, #do_sample parameter is set to False by default. After setting to True, the generation methods turn into beam-search multinomial sampling decoding strategy. no_repeat_ngram_size=6, #Configure the probability of the next repeating n-gram to 0, to ensure that there are no n-grams appearing twice. This setting is an empirical preliminary exploration. repetition_penalty=1.8, #For words that have appeared before, in the subsequent prediction process, we reduce the probability of their reoccurrence by introducing the repetition_penalty parameter. This setting is an empirical preliminary exploration.
do_sample改成false了,两次预测也不一样
想一样,可以把 adapter_config.json 里的 inference_mode改为false
我又试了下,inference_mode=false 会导致lora没应用。还是得改参数:参数do_sample=False,temperature=0.01,top_p=0.01,num_beams=1,batch_size=1 参考:https://github.com/THUDM/ChatGLM-6B/issues/841
do_sample=False,temperature=0.01,top_p=0.01,num_beams=1,batch_size=1 ,epoch = 10。按照这个参数重新跑了一下,结果如下:
第一个回答是模型train后直接predict, 第二个为重新加载后的回答。
直接python training_chatglm_adgen_demo.py --do_predict 的回答如下:
看上去参数没问题,后面两次回答一模一样。问题是第一次的回答咋来的?
看截图我认为第一次预测正确,第二次预测加载lora是错误的,由于不知道你测试的代码,我这边写了个单测 https://github.com/shibing624/textgen/commit/13f41e1646a39808b4884c8b243c8e70c40afc2f#diff-c16315637707e318e1a2a59bab06f4de05f6ab1a6edc9afb460f2bcfbe23e83d ,是可以正常加载lora模型的。
预测结果: 训练完立即预测:
第二次加载模型并预测:
训练数据用的你给的100条,规范了格式,加了input空字段,复现:
cd textgen/tests; python test_chatglm_training.py
补充下,由于数据样本少,"num_train_epochs": 20, 可以稳定学到起店名的语义,预测结果才每次符合预期。
补充下,由于数据样本少,"num_train_epochs": 20, 可以稳定学到起店名的语义,预测结果才每次符合预期。
感谢大佬,试了一下,如果换成20个epoch的话,两次输出都是一样了。 如果peft name 和 output_dir 都配置的话,模型会加载两次,lora就好像失效了。
补充下,由于数据样本少,"num_train_epochs": 20, 可以稳定学到起店名的语义,预测结果才每次符合预期。
感谢大佬,试了一下,如果换成20个epoch的话,两次输出都是一样了。 如果peft name 和 output_dir 都配置的话,模型会加载两次,lora就好像失效了。
加载两次fixed 了。
根据https://github.com/THUDM/ChatGLM-6B/tree/main/improve 说的,100条样本足够训练模型,在此基础上实验。 训练样例如下: {"content": "请给我的美妆店起一个名字。", "summary": "店铺名:柏雅诗BAYEAS。 说明:balmy(温和的)+yearn(思念)+sanguine(快乐的);雅,优雅。 请问您对于名称还有什么特殊要求吗?例如,是否希望名称具有特定的含义或特点?"} 整个训练集100条样本。
在training_chatglm_adgen_demo.py 后面做了轻微改动如下: 输出结果如下:
第一次明显学到了训练样本的一些特征,为何重新加载模型后就完全体现不出来?