shibing624 / textgen

TextGen: Implementation of Text Generation models, include LLaMA, BLOOM, GPT2, BART, T5, SongNet and so on. 文本生成模型,实现了包括LLaMA,ChatGLM,BLOOM,GPT2,Seq2Seq,BART,T5,UDA等模型的训练和预测,开箱即用。
Apache License 2.0
924 stars 107 forks source link

chatglm用lora训练完predict出的结果和重新加载模型和lora后输出的结果差异很大 #27

Closed hongyix closed 1 year ago

hongyix commented 1 year ago

根据 说的,100条样本足够训练模型,在此基础上实验。 训练样例如下: {"content": "请给我的美妆店起一个名字。", "summary": "店铺名:柏雅诗BAYEAS。 说明:balmy(温和的)+yearn(思念)+sanguine(快乐的);雅,优雅。 请问您对于名称还有什么特殊要求吗?例如,是否希望名称具有特定的含义或特点?"} 整个训练集100条样本。 后面做了轻微改动如下: image 输出结果如下: image


shibing624 commented 1 year ago

没搞懂你写2遍干嘛?评估 预测时,python --do_predict 就行。

hongyix commented 1 year ago

没搞懂你写2遍干嘛?评估 预测时,python --do_predict 就行。


shibing624 commented 1 year ago
  1. 看case应该是第二次lora没有加载成功的效果,建议第二次执行用python --do_predict 试试;
  2. 更新代码看下;
  3. 用示例数据测试下lora加载逻辑是否正常;
shibing624 commented 1 year ago

附上, 生成模型预测调参建议:

建议去调整下 top_p, num_beams, repetition_renalty, temperature, do_sample=True;


csc这种任务不复杂,训练样本少,调低 temperature。 以上是经验参数,具体调参根据任务而定,不是固定的。


Moderately increase the probability threshold of nucleus sampling to increase the quantity of candidate tokens and increase generation diversity.


The previous low temperature parameter could lead to a severe polarization in the probability distribution of generated words, which degenerates the generation strategy into greedy decoding.


do_sample parameter is set to False by default. After setting to True, the generation methods turn into beam-search multinomial sampling decoding strategy.


Configure the probability of the next repeating n-gram to 0, to ensure that there are no n-grams appearing twice. This setting is an empirical preliminary exploration.


For words that have appeared before, in the subsequent prediction process, we reduce the probability of their reoccurrence by introducing the repetition_penalty parameter. This setting is an empirical preliminary exploration.

hongyix commented 1 year ago
  1. 看case应该是第二次lora没有加载成功的效果,建议第二次执行用python --do_predict 试试;
  2. 更新代码看下;
  3. 用示例数据测试下lora加载逻辑是否正常;
  1. 直接用python --do_predict的结果是跟第二次的结果一样,模型没有学会训练样本的格式。
  2. 代码用的最新的。
  3. 已经尝试了别的lora(训练样本>1万),加载逻辑正常,模型能回答微调后的格式。


hongyix commented 1 year ago

附上, 生成模型预测调参建议:

建议去调整下 top_p, num_beams, repetition_renalty, temperature, do_sample=True;


csc这种任务不复杂,训练样本少,调低 temperature。 以上是经验参数,具体调参根据任务而定,不是固定的。

top_p=0.9, #Moderately increase the probability threshold of nucleus sampling to increase the quantity of candidate tokens and increase generation diversity.

temperature=1.0, #The previous low temperature parameter could lead to a severe polarization in the probability distribution of generated words, which degenerates the generation strategy into greedy decoding.

do_sample=True, #do_sample parameter is set to False by default. After setting to True, the generation methods turn into beam-search multinomial sampling decoding strategy.

no_repeat_ngram_size=6, #Configure the probability of the next repeating n-gram to 0, to ensure that there are no n-grams appearing twice. This setting is an empirical preliminary exploration.

repetition_penalty=1.8, #For words that have appeared before, in the subsequent prediction process, we reduce the probability of their reoccurrence by introducing the repetition_penalty parameter. This setting is an empirical preliminary exploration.


shibing624 commented 1 year ago
  1. 名字重复,经验来讲是欠拟合,解决此问题可以把num_epoch 改为10以上。
  2. 我理解ptuning和lora、prompt-tuning等delta tuning方法是一样的,100条数据是够的,原因不是ptuningv2方法特别,是电商广告数据的微调数据跟chatglm预训练集分布一致,或者直白的说ChatGLM训练已经用了ADGEN的微调数据。
hongyix commented 1 year ago

线上店铺名训练集.txt 这是使用的训练集,大佬你可以试试。

shibing624 commented 1 year ago

附上, 生成模型预测调参建议: 建议去调整下 top_p, num_beams, repetition_renalty, temperature, do_sample=True; 数据生成有重复,调高repetition_renalty; csc这种任务不复杂,训练样本少,调低 temperature。 以上是经验参数,具体调参根据任务而定,不是固定的。 top_p=0.9, #Moderately increase the probability threshold of nucleus sampling to increase the quantity of candidate tokens and increase generation diversity. temperature=1.0, #The previous low temperature parameter could lead to a severe polarization in the probability distribution of generated words, which degenerates the generation strategy into greedy decoding. do_sample=True, #do_sample parameter is set to False by default. After setting to True, the generation methods turn into beam-search multinomial sampling decoding strategy. no_repeat_ngram_size=6, #Configure the probability of the next repeating n-gram to 0, to ensure that there are no n-grams appearing twice. This setting is an empirical preliminary exploration. repetition_penalty=1.8, #For words that have appeared before, in the subsequent prediction process, we reduce the probability of their reoccurrence by introducing the repetition_penalty parameter. This setting is an empirical preliminary exploration.


想一样,可以把 adapter_config.json 里的 inference_mode改为false

shibing624 commented 1 year ago

附上, 生成模型预测调参建议: 建议去调整下 top_p, num_beams, repetition_renalty, temperature, do_sample=True; 数据生成有重复,调高repetition_renalty; csc这种任务不复杂,训练样本少,调低 temperature。 以上是经验参数,具体调参根据任务而定,不是固定的。 top_p=0.9, #Moderately increase the probability threshold of nucleus sampling to increase the quantity of candidate tokens and increase generation diversity. temperature=1.0, #The previous low temperature parameter could lead to a severe polarization in the probability distribution of generated words, which degenerates the generation strategy into greedy decoding. do_sample=True, #do_sample parameter is set to False by default. After setting to True, the generation methods turn into beam-search multinomial sampling decoding strategy. no_repeat_ngram_size=6, #Configure the probability of the next repeating n-gram to 0, to ensure that there are no n-grams appearing twice. This setting is an empirical preliminary exploration. repetition_penalty=1.8, #For words that have appeared before, in the subsequent prediction process, we reduce the probability of their reoccurrence by introducing the repetition_penalty parameter. This setting is an empirical preliminary exploration.


想一样,可以把 adapter_config.json 里的 inference_mode改为false

我又试了下,inference_mode=false 会导致lora没应用。还是得改参数:参数do_sample=False,temperature=0.01,top_p=0.01,num_beams=1,batch_size=1 参考:

hongyix commented 1 year ago

do_sample=False,temperature=0.01,top_p=0.01,num_beams=1,batch_size=1 ,epoch = 10。按照这个参数重新跑了一下,结果如下: image

第一个回答是模型train后直接predict, 第二个为重新加载后的回答。

直接python --do_predict 的回答如下: image


shibing624 commented 1 year ago

看截图我认为第一次预测正确,第二次预测加载lora是错误的,由于不知道你测试的代码,我这边写了个单测 ,是可以正常加载lora模型的。

预测结果: 训练完立即预测: Xnip2023-05-11_15-42-38




  1. 更新代码
  2. cd textgen/tests; python
shibing624 commented 1 year ago

补充下,由于数据样本少,"num_train_epochs": 20, 可以稳定学到起店名的语义,预测结果才每次符合预期。

hongyix commented 1 year ago

补充下,由于数据样本少,"num_train_epochs": 20, 可以稳定学到起店名的语义,预测结果才每次符合预期。

感谢大佬,试了一下,如果换成20个epoch的话,两次输出都是一样了。 如果peft name 和 output_dir 都配置的话,模型会加载两次,lora就好像失效了。


shibing624 commented 1 year ago

补充下,由于数据样本少,"num_train_epochs": 20, 可以稳定学到起店名的语义,预测结果才每次符合预期。

感谢大佬,试了一下,如果换成20个epoch的话,两次输出都是一样了。 如果peft name 和 output_dir 都配置的话,模型会加载两次,lora就好像失效了。


加载两次fixed 了。