hongyix commented 1 year ago

根据https://github.com/THUDM/ChatGLM-6B/tree/main/improve 说的，100条样本足够训练模型，在此基础上实验。训练样例如下： {"content": "请给我的美妆店起一个名字。", "summary": "店铺名：柏雅诗BAYEAS。说明：balmy（温和的）+yearn（思念）+sanguine（快乐的）；雅，优雅。请问您对于名称还有什么特殊要求吗？例如，是否希望名称具有特定的含义或特点？"} 整个训练集100条样本。

在training_chatglm_adgen_demo.py 后面做了轻微改动如下：输出结果如下：

第一次明显学到了训练样本的一些特征，为何重新加载模型后就完全体现不出来？

shibing624 commented 1 year ago

没搞懂你写2遍干嘛？评估预测时，python training_chatglm_adgen_demo.py --do_predict 就行。

hongyix commented 1 year ago

没搞懂你写2遍干嘛？评估预测时，python training_chatglm_adgen_demo.py --do_predict 就行。

上面评估两次，一次是训练完的模型接着predict。第二次是重新加载模型，现在的问题是两次评估的结果差异很大。大佬请问这是什么原因导致的？

shibing624 commented 1 year ago

看case应该是第二次lora没有加载成功的效果，建议第二次执行用python training_chatglm_adgen_demo.py --do_predict 试试；
更新代码看下；
用示例数据测试下lora加载逻辑是否正常；

shibing624 commented 1 year ago

附上，生成模型预测调参建议：

建议去调整下 top_p, num_beams, repetition_renalty, temperature, do_sample=True;

数据生成有重复，调高repetition_renalty；

csc这种任务不复杂，训练样本少，调低 temperature。以上是经验参数，具体调参根据任务而定，不是固定的。

top_p=0.9,

Moderately increase the probability threshold of nucleus sampling to increase the quantity of candidate tokens and increase generation diversity.

temperature=1.0,

The previous low temperature parameter could lead to a severe polarization in the probability distribution of generated words, which degenerates the generation strategy into greedy decoding.

do_sample=True,

do_sample parameter is set to False by default. After setting to True, the generation methods turn into beam-search multinomial sampling decoding strategy.

no_repeat_ngram_size=6,

Configure the probability of the next repeating n-gram to 0, to ensure that there are no n-grams appearing twice. This setting is an empirical preliminary exploration.

repetition_penalty=1.8,

For words that have appeared before, in the subsequent prediction process, we reduce the probability of their reoccurrence by introducing the repetition_penalty parameter. This setting is an empirical preliminary exploration.

hongyix commented 1 year ago

看case应该是第二次lora没有加载成功的效果，建议第二次执行用python training_chatglm_adgen_demo.py --do_predict 试试；

更新代码看下；

用示例数据测试下lora加载逻辑是否正常；

直接用python training_chatglm_adgen_demo.py --do_predict的结果是跟第二次的结果一样，模型没有学会训练样本的格式。
代码用的最新的。
已经尝试了别的lora（训练样本>1万），加载逻辑正常，模型能回答微调后的格式。

我的理解是模型回答不出来100条训练样本的格式是正常的，lora欠拟合，上面链接说的100条足够训练应该是基于ptuning的。现在问题是为啥train完predict的会出现那样的结果。

hongyix commented 1 year ago

附上，生成模型预测调参建议：

建议去调整下 top_p, num_beams, repetition_renalty, temperature, do_sample=True;

数据生成有重复，调高repetition_renalty；

csc这种任务不复杂，训练样本少，调低 temperature。以上是经验参数，具体调参根据任务而定，不是固定的。

top_p=0.9, #Moderately increase the probability threshold of nucleus sampling to increase the quantity of candidate tokens and increase generation diversity.

temperature=1.0, #The previous low temperature parameter could lead to a severe polarization in the probability distribution of generated words, which degenerates the generation strategy into greedy decoding.

do_sample=True, #do_sample parameter is set to False by default. After setting to True, the generation methods turn into beam-search multinomial sampling decoding strategy.

no_repeat_ngram_size=6, #Configure the probability of the next repeating n-gram to 0, to ensure that there are no n-grams appearing twice. This setting is an empirical preliminary exploration.

repetition_penalty=1.8, #For words that have appeared before, in the subsequent prediction process, we reduce the probability of their reoccurrence by introducing the repetition_penalty parameter. This setting is an empirical preliminary exploration.

do_sample改成false了，两次预测也不一样

shibing624 commented 1 year ago

名字重复，经验来讲是欠拟合，解决此问题可以把num_epoch 改为10以上。
我理解ptuning和lora、prompt-tuning等delta tuning方法是一样的，100条数据是够的，原因不是ptuningv2方法特别，是电商广告数据的微调数据跟chatglm预训练集分布一致，或者直白的说ChatGLM训练已经用了ADGEN的微调数据。

hongyix commented 1 year ago

线上店铺名训练集.txt 这是使用的训练集，大佬你可以试试。

shibing624 commented 1 year ago

附上，生成模型预测调参建议：建议去调整下 top_p, num_beams, repetition_renalty, temperature, do_sample=True; 数据生成有重复，调高repetition_renalty； csc这种任务不复杂，训练样本少，调低 temperature。以上是经验参数，具体调参根据任务而定，不是固定的。 top_p=0.9, #Moderately increase the probability threshold of nucleus sampling to increase the quantity of candidate tokens and increase generation diversity. temperature=1.0, #The previous low temperature parameter could lead to a severe polarization in the probability distribution of generated words, which degenerates the generation strategy into greedy decoding. do_sample=True, #do_sample parameter is set to False by default. After setting to True, the generation methods turn into beam-search multinomial sampling decoding strategy. no_repeat_ngram_size=6, #Configure the probability of the next repeating n-gram to 0, to ensure that there are no n-grams appearing twice. This setting is an empirical preliminary exploration. repetition_penalty=1.8, #For words that have appeared before, in the subsequent prediction process, we reduce the probability of their reoccurrence by introducing the repetition_penalty parameter. This setting is an empirical preliminary exploration.

do_sample改成false了，两次预测也不一样

想一样，可以把 adapter_config.json 里的 inference_mode改为false

shibing624 commented 1 year ago

附上，生成模型预测调参建议：建议去调整下 top_p, num_beams, repetition_renalty, temperature, do_sample=True; 数据生成有重复，调高repetition_renalty； csc这种任务不复杂，训练样本少，调低 temperature。以上是经验参数，具体调参根据任务而定，不是固定的。 top_p=0.9, #Moderately increase the probability threshold of nucleus sampling to increase the quantity of candidate tokens and increase generation diversity. temperature=1.0, #The previous low temperature parameter could lead to a severe polarization in the probability distribution of generated words, which degenerates the generation strategy into greedy decoding. do_sample=True, #do_sample parameter is set to False by default. After setting to True, the generation methods turn into beam-search multinomial sampling decoding strategy. no_repeat_ngram_size=6, #Configure the probability of the next repeating n-gram to 0, to ensure that there are no n-grams appearing twice. This setting is an empirical preliminary exploration. repetition_penalty=1.8, #For words that have appeared before, in the subsequent prediction process, we reduce the probability of their reoccurrence by introducing the repetition_penalty parameter. This setting is an empirical preliminary exploration.

do_sample改成false了，两次预测也不一样

想一样，可以把 adapter_config.json 里的 inference_mode改为false

我又试了下，inference_mode=false 会导致lora没应用。还是得改参数：参数do_sample=False,temperature=0.01,top_p=0.01,num_beams=1，batch_size=1 参考：https://github.com/THUDM/ChatGLM-6B/issues/841

hongyix commented 1 year ago

do_sample=False,temperature=0.01,top_p=0.01,num_beams=1，batch_size=1 ，epoch = 10。按照这个参数重新跑了一下，结果如下：

第一个回答是模型train后直接predict, 第二个为重新加载后的回答。

直接python training_chatglm_adgen_demo.py --do_predict 的回答如下：

看上去参数没问题，后面两次回答一模一样。问题是第一次的回答咋来的？

shibing624 commented 1 year ago

看截图我认为第一次预测正确，第二次预测加载lora是错误的，由于不知道你测试的代码，我这边写了个单测 https://github.com/shibing624/textgen/commit/13f41e1646a39808b4884c8b243c8e70c40afc2f#diff-c16315637707e318e1a2a59bab06f4de05f6ab1a6edc9afb460f2bcfbe23e83d ，是可以正常加载lora模型的。

预测结果：训练完立即预测: Xnip2023-05-11_15-42-38

第二次加载模型并预测：

训练数据用的你给的100条，规范了格式，加了input空字段，复现：

更新代码
cd textgen/tests; python test_chatglm_training.py

shibing624 commented 1 year ago

补充下，由于数据样本少，"num_train_epochs": 20, 可以稳定学到起店名的语义，预测结果才每次符合预期。

hongyix commented 1 year ago

补充下，由于数据样本少，"num_train_epochs": 20, 可以稳定学到起店名的语义，预测结果才每次符合预期。

感谢大佬，试了一下，如果换成20个epoch的话，两次输出都是一样了。如果peft name 和 output_dir 都配置的话，模型会加载两次，lora就好像失效了。

shibing624 commented 1 year ago

补充下，由于数据样本少，"num_train_epochs": 20, 可以稳定学到起店名的语义，预测结果才每次符合预期。

感谢大佬，试了一下，如果换成20个epoch的话，两次输出都是一样了。如果peft name 和 output_dir 都配置的话，模型会加载两次，lora就好像失效了。

加载两次fixed 了。

shibing624 / textgen

chatglm用lora训练完predict出的结果和重新加载模型和lora后输出的结果差异很大 #27

Moderately increase the probability threshold of nucleus sampling to increase the quantity of candidate tokens and increase generation diversity.

The previous low temperature parameter could lead to a severe polarization in the probability distribution of generated words, which degenerates the generation strategy into greedy decoding.

do_sample parameter is set to False by default. After setting to True, the generation methods turn into beam-search multinomial sampling decoding strategy.

Configure the probability of the next repeating n-gram to 0, to ensure that there are no n-grams appearing twice. This setting is an empirical preliminary exploration.

For words that have appeared before, in the subsequent prediction process, we reduce the probability of their reoccurrence by introducing the repetition_penalty parameter. This setting is an empirical preliminary exploration.