shibing624 / textgen

TextGen: Implementation of Text Generation models, include LLaMA, BLOOM, GPT2, BART, T5, SongNet and so on. 文本生成模型,实现了包括LLaMA,ChatGLM,BLOOM,GPT2,Seq2Seq,BART,T5,UDA等模型的训练和预测,开箱即用。
Apache License 2.0
937 stars 109 forks source link

chatglm用lora训练完predict出的结果和重新加载模型和lora后输出的结果差异很大 #27

Closed hongyix closed 1 year ago

hongyix commented 1 year ago

根据https://github.com/THUDM/ChatGLM-6B/tree/main/improve 说的,100条样本足够训练模型,在此基础上实验。 训练样例如下: {"content": "请给我的美妆店起一个名字。", "summary": "店铺名:柏雅诗BAYEAS。 说明:balmy(温和的)+yearn(思念)+sanguine(快乐的);雅,优雅。 请问您对于名称还有什么特殊要求吗?例如,是否希望名称具有特定的含义或特点?"} 整个训练集100条样本。

training_chatglm_adgen_demo.py 后面做了轻微改动如下: image 输出结果如下: image

第一次明显学到了训练样本的一些特征,为何重新加载模型后就完全体现不出来?

shibing624 commented 1 year ago

没搞懂你写2遍干嘛?评估 预测时,python training_chatglm_adgen_demo.py --do_predict 就行。

hongyix commented 1 year ago

没搞懂你写2遍干嘛?评估 预测时,python training_chatglm_adgen_demo.py --do_predict 就行。

上面评估两次,一次是训练完的模型接着predict。第二次是重新加载模型,现在的问题是两次评估的结果差异很大。大佬请问这是什么原因导致的?

shibing624 commented 1 year ago
  1. 看case应该是第二次lora没有加载成功的效果,建议第二次执行用python training_chatglm_adgen_demo.py --do_predict 试试;
  2. 更新代码看下;
  3. 用示例数据测试下lora加载逻辑是否正常;
shibing624 commented 1 year ago

附上, 生成模型预测调参建议:

建议去调整下 top_p, num_beams, repetition_renalty, temperature, do_sample=True;

数据生成有重复,调高repetition_renalty;

csc这种任务不复杂,训练样本少,调低 temperature。 以上是经验参数,具体调参根据任务而定,不是固定的。

top_p=0.9,

Moderately increase the probability threshold of nucleus sampling to increase the quantity of candidate tokens and increase generation diversity.

temperature=1.0,

The previous low temperature parameter could lead to a severe polarization in the probability distribution of generated words, which degenerates the generation strategy into greedy decoding.

do_sample=True,

do_sample parameter is set to False by default. After setting to True, the generation methods turn into beam-search multinomial sampling decoding strategy.

no_repeat_ngram_size=6,

Configure the probability of the next repeating n-gram to 0, to ensure that there are no n-grams appearing twice. This setting is an empirical preliminary exploration.

repetition_penalty=1.8,

For words that have appeared before, in the subsequent prediction process, we reduce the probability of their reoccurrence by introducing the repetition_penalty parameter. This setting is an empirical preliminary exploration.

hongyix commented 1 year ago
  1. 看case应该是第二次lora没有加载成功的效果,建议第二次执行用python training_chatglm_adgen_demo.py --do_predict 试试;
  2. 更新代码看下;
  3. 用示例数据测试下lora加载逻辑是否正常;
  1. 直接用python training_chatglm_adgen_demo.py --do_predict的结果是跟第二次的结果一样,模型没有学会训练样本的格式。
  2. 代码用的最新的。
  3. 已经尝试了别的lora(训练样本>1万),加载逻辑正常,模型能回答微调后的格式。

我的理解是模型回答不出来100条训练样本的格式是正常的,lora欠拟合,上面链接说的100条足够训练应该是基于ptuning的。现在问题是为啥train完predict的会出现那样的结果。

hongyix commented 1 year ago

附上, 生成模型预测调参建议:

建议去调整下 top_p, num_beams, repetition_renalty, temperature, do_sample=True;

数据生成有重复,调高repetition_renalty;

csc这种任务不复杂,训练样本少,调低 temperature。 以上是经验参数,具体调参根据任务而定,不是固定的。

top_p=0.9, #Moderately increase the probability threshold of nucleus sampling to increase the quantity of candidate tokens and increase generation diversity.

temperature=1.0, #The previous low temperature parameter could lead to a severe polarization in the probability distribution of generated words, which degenerates the generation strategy into greedy decoding.

do_sample=True, #do_sample parameter is set to False by default. After setting to True, the generation methods turn into beam-search multinomial sampling decoding strategy.

no_repeat_ngram_size=6, #Configure the probability of the next repeating n-gram to 0, to ensure that there are no n-grams appearing twice. This setting is an empirical preliminary exploration.

repetition_penalty=1.8, #For words that have appeared before, in the subsequent prediction process, we reduce the probability of their reoccurrence by introducing the repetition_penalty parameter. This setting is an empirical preliminary exploration.

do_sample改成false了,两次预测也不一样

shibing624 commented 1 year ago
  1. 名字重复,经验来讲是欠拟合,解决此问题可以把num_epoch 改为10以上。
  2. 我理解ptuning和lora、prompt-tuning等delta tuning方法是一样的,100条数据是够的,原因不是ptuningv2方法特别,是电商广告数据的微调数据跟chatglm预训练集分布一致,或者直白的说ChatGLM训练已经用了ADGEN的微调数据。
hongyix commented 1 year ago

线上店铺名训练集.txt 这是使用的训练集,大佬你可以试试。

shibing624 commented 1 year ago

附上, 生成模型预测调参建议: 建议去调整下 top_p, num_beams, repetition_renalty, temperature, do_sample=True; 数据生成有重复,调高repetition_renalty; csc这种任务不复杂,训练样本少,调低 temperature。 以上是经验参数,具体调参根据任务而定,不是固定的。 top_p=0.9, #Moderately increase the probability threshold of nucleus sampling to increase the quantity of candidate tokens and increase generation diversity. temperature=1.0, #The previous low temperature parameter could lead to a severe polarization in the probability distribution of generated words, which degenerates the generation strategy into greedy decoding. do_sample=True, #do_sample parameter is set to False by default. After setting to True, the generation methods turn into beam-search multinomial sampling decoding strategy. no_repeat_ngram_size=6, #Configure the probability of the next repeating n-gram to 0, to ensure that there are no n-grams appearing twice. This setting is an empirical preliminary exploration. repetition_penalty=1.8, #For words that have appeared before, in the subsequent prediction process, we reduce the probability of their reoccurrence by introducing the repetition_penalty parameter. This setting is an empirical preliminary exploration.

do_sample改成false了,两次预测也不一样

想一样,可以把 adapter_config.json 里的 inference_mode改为false

shibing624 commented 1 year ago

附上, 生成模型预测调参建议: 建议去调整下 top_p, num_beams, repetition_renalty, temperature, do_sample=True; 数据生成有重复,调高repetition_renalty; csc这种任务不复杂,训练样本少,调低 temperature。 以上是经验参数,具体调参根据任务而定,不是固定的。 top_p=0.9, #Moderately increase the probability threshold of nucleus sampling to increase the quantity of candidate tokens and increase generation diversity. temperature=1.0, #The previous low temperature parameter could lead to a severe polarization in the probability distribution of generated words, which degenerates the generation strategy into greedy decoding. do_sample=True, #do_sample parameter is set to False by default. After setting to True, the generation methods turn into beam-search multinomial sampling decoding strategy. no_repeat_ngram_size=6, #Configure the probability of the next repeating n-gram to 0, to ensure that there are no n-grams appearing twice. This setting is an empirical preliminary exploration. repetition_penalty=1.8, #For words that have appeared before, in the subsequent prediction process, we reduce the probability of their reoccurrence by introducing the repetition_penalty parameter. This setting is an empirical preliminary exploration.

do_sample改成false了,两次预测也不一样

想一样,可以把 adapter_config.json 里的 inference_mode改为false

我又试了下,inference_mode=false 会导致lora没应用。还是得改参数:参数do_sample=False,temperature=0.01,top_p=0.01,num_beams=1,batch_size=1 参考:https://github.com/THUDM/ChatGLM-6B/issues/841

hongyix commented 1 year ago

do_sample=False,temperature=0.01,top_p=0.01,num_beams=1,batch_size=1 ,epoch = 10。按照这个参数重新跑了一下,结果如下: image

第一个回答是模型train后直接predict, 第二个为重新加载后的回答。

直接python training_chatglm_adgen_demo.py --do_predict 的回答如下: image

看上去参数没问题,后面两次回答一模一样。问题是第一次的回答咋来的?

shibing624 commented 1 year ago

看截图我认为第一次预测正确,第二次预测加载lora是错误的,由于不知道你测试的代码,我这边写了个单测 https://github.com/shibing624/textgen/commit/13f41e1646a39808b4884c8b243c8e70c40afc2f#diff-c16315637707e318e1a2a59bab06f4de05f6ab1a6edc9afb460f2bcfbe23e83d ,是可以正常加载lora模型的。

预测结果: 训练完立即预测: Xnip2023-05-11_15-42-38

第二次加载模型并预测:

Xnip2023-05-11_15-42-50

训练数据用的你给的100条,规范了格式,加了input空字段,复现:

  1. 更新代码
  2. cd textgen/tests; python test_chatglm_training.py
shibing624 commented 1 year ago

补充下,由于数据样本少,"num_train_epochs": 20, 可以稳定学到起店名的语义,预测结果才每次符合预期。

hongyix commented 1 year ago

补充下,由于数据样本少,"num_train_epochs": 20, 可以稳定学到起店名的语义,预测结果才每次符合预期。

感谢大佬,试了一下,如果换成20个epoch的话,两次输出都是一样了。 如果peft name 和 output_dir 都配置的话,模型会加载两次,lora就好像失效了。

image

shibing624 commented 1 year ago

补充下,由于数据样本少,"num_train_epochs": 20, 可以稳定学到起店名的语义,预测结果才每次符合预期。

感谢大佬,试了一下,如果换成20个epoch的话,两次输出都是一样了。 如果peft name 和 output_dir 都配置的话,模型会加载两次,lora就好像失效了。

image

加载两次fixed 了。