zjunlp / EasyEdit

[ACL 2024] An Easy-to-use Knowledge Editing Framework for LLMs.
https://zjunlp.github.io/project/KnowEdit
MIT License
1.74k stars 210 forks source link

llama2 SERAC #276

Closed SXxinxiaosong closed 3 months ago

SXxinxiaosong commented 3 months ago

05/30/2024 23:22:17 - INFO - easyeditor.trainer.EditTrainer - Beginning evaluation for 19009 steps... 05/30/2024 23:25:00 - INFO - easyeditor.trainer.EditTrainer - Step 500/19009 edit: 1.00000 acc_pre: 0.00833 acc_post: 0.00833 acc_delta: 0.00000 it_time: 0.3260 05/30/2024 23:27:43 - INFO - easyeditor.trainer.EditTrainer - Step 1000/19009 edit: 0.99800 acc_pre: 0.01125 acc_post: 0.01125 acc_delta: 0.00000 it_time: 0.3262 05/30/2024 23:30:26 - INFO - easyeditor.trainer.EditTrainer - Step 1500/19009 edit: 0.99867 acc_pre: 0.01166 acc_post: 0.01166 acc_delta: 0.00000 it_time: 0.3261 05/30/2024 23:33:09 - INFO - easyeditor.trainer.EditTrainer - Step 2000/19009 edit: 0.99900 acc_pre: 0.01147 acc_post: 0.01147 acc_delta: 0.00000 it_time: 0.3264 05/30/2024 23:35:53 - INFO - easyeditor.trainer.EditTrainer - Step 2500/19009 edit: 0.99920 acc_pre: 0.01158 acc_post: 0.01158 acc_delta: 0.00000 it_time: 0.3265 05/30/2024 23:38:37 - INFO - easyeditor.trainer.EditTrainer - Step 3000/19009 edit: 0.99893 acc_pre: 0.01158 acc_post: 0.01158 acc_delta: 0.00000 it_time: 0.3266 得到的准确率为什么这么低啊,是哪里搞错了吗

XeeKee commented 3 months ago

Hello, we believe there is no issue with this place. You can try using ckpt to test if the results are abnormal.

pengzju commented 3 months ago

You need to pay attention to the acc_post on the train set, Could you provide more logs/information regarding the train set?

SXxinxiaosong commented 3 months ago

05/30/2024 20:21:20 - INFO - easyeditor.trainer.BaseTrainer - Step 500: 05/30/2024 20:21:20 - INFO - easyeditor.trainer.BaseTrainer - loss/edit_train: 4.25928; loss/loc_train: 0.00346; edit/acc_train: 0.69607; edit/log_prob_train: -4.25928; edit/prob_train: 0.60101; acc/pre_train: 0.01928; acc/post_train: 0.01928; nll/pre_train: 11.79155; perplexity/pre_train: 132131.03125; nll/post_train: 10.67658; perplexity/post_train: 43329.19531; n_tokens/pre_train: 7.86000; n_tokens/post_train: 7.86000; time/edit_train: 0.00542; loss/total_train: 0.42939; loss/total_edit_train: 0.42939; memory/alloc_max_train: 0.00000; memory/res_max_train: 0.00000 05/30/2024 20:30:50 - INFO - easyeditor.trainer.BaseTrainer - Step 1000: 05/30/2024 20:30:50 - INFO - easyeditor.trainer.BaseTrainer - loss/edit_train: 4.27985; loss/loc_train: 0.00235; edit/acc_train: 0.72839; edit/log_prob_train: -4.27985; edit/prob_train: 0.69608; acc/pre_train: 0.01301; acc/post_train: 0.01301; nll/pre_train: 11.88149; perplexity/pre_train: 144565.98438; nll/post_train: 10.75037; perplexity/post_train: 46647.09375; n_tokens/pre_train: 7.65000; n_tokens/post_train: 7.65000; time/edit_train: 0.00549; loss/total_train: 0.43033; loss/total_edit_train: 0.43033; memory/alloc_max_train: 0.00000; memory/res_max_train: 0.00000 05/30/2024 20:40:21 - INFO - easyeditor.trainer.BaseTrainer - Step 1500: 05/30/2024 20:40:21 - INFO - easyeditor.trainer.BaseTrainer - loss/edit_train: 0.13670; loss/loc_train: 0.00225; edit/acc_train: 0.98767; edit/log_prob_train: -0.13670; edit/prob_train: 0.94624; acc/pre_train: 0.01445; acc/post_train: 0.01445; nll/pre_train: 12.15079; perplexity/pre_train: 189243.32812; nll/post_train: 11.03521; perplexity/post_train: 62020.11328; n_tokens/pre_train: 7.71600; n_tokens/post_train: 7.71600; time/edit_train: 0.00537; loss/total_train: 0.01592; loss/total_edit_train: 0.01592; memory/alloc_max_train: 0.00000; memory/res_max_train: 0.00000 05/30/2024 20:49:52 - INFO - easyeditor.trainer.BaseTrainer - Step 2000: 05/30/2024 20:49:52 - INFO - easyeditor.trainer.BaseTrainer - loss/edit_train: 0.02482; loss/loc_train: 0.00228; edit/acc_train: 0.99850; edit/log_prob_train: -0.02482; edit/prob_train: 0.98341; acc/pre_train: 0.01396; acc/post_train: 0.01396; nll/pre_train: 12.11003; perplexity/pre_train: 181685.89062; nll/post_train: 10.95400; perplexity/post_train: 57182.48828; n_tokens/pre_train: 7.31800; n_tokens/post_train: 7.31800; time/edit_train: 0.00538; loss/total_train: 0.00476; loss/total_edit_train: 0.00476; memory/alloc_max_train: 0.00000; memory/res_max_train: 0.00000 。。。。 05/30/2024 23:12:50 - INFO - easyeditor.trainer.BaseTrainer - Step 9500: 05/30/2024 23:12:50 - INFO - easyeditor.trainer.BaseTrainer - loss/edit_train: 0.00538; loss/loc_train: 0.00229; edit/acc_train: 1.00000; edit/log_prob_train: -0.00538; edit/prob_train: 0.99467; acc/pre_train: 0.01471; acc/post_train: 0.01471; nll/pre_train: 12.00088; perplexity/pre_train: 162897.73438; nll/post_train: 10.92346; perplexity/post_train: 55462.23828; n_tokens/pre_train: 8.11400; n_tokens/post_train: 8.11400; time/edit_train: 0.00538; loss/total_train: 0.00283; loss/total_edit_train: 0.00283; memory/alloc_max_train: 0.00000; memory/res_max_train: 0.00000

请问这个训练结果是有问题的吗?谢谢~

pengzju commented 3 months ago

There is no issue with the training results. Model editing is different from traditional ML tasks; it does not need to generalize on unseen validation data but only needs to successfully correct the edited examples.

So you need to focus on like edit/acc_train: 0.99850. You can directly use the checkpoint to perform the edit.

SXxinxiaosong commented 3 months ago

所以,模型编辑一般是只有训练集,没有测试集吗

SXxinxiaosong commented 3 months ago

我还有一个不明白的问题是, 对于SERAC来说,easyeditor.trainer.BaseTrainer这一步不是在训练分类器和反事实模型吗?

pengzju commented 3 months ago

Yes, more precisely, it is called the edit set.

所以,模型编辑一般是只有训练集,没有测试集吗

SXxinxiaosong commented 3 months ago

我还有一个不明白的问题是, 对于SERAC来说,easyeditor.trainer.BaseTrainer这一步不是在训练分类器和反事实模型吗?

用train set训练分类器,再用train set编辑,好像没太有道理吧。希望得到您的解答,谢谢~

pengzju commented 3 months ago

Why doesn't it make sense?

As long as it fits during the training phase (achieving the classification of editing scope and the counterfact model can output new knowledge), shouldn't it successfully edit during the edit phase?

zxlzr commented 3 months ago

Hi, do you have any further questions?

SXxinxiaosong commented 3 months ago

Why doesn't it make sense?

As long as it fits during the training phase (achieving the classification of editing scope and the counterfact model can output new knowledge), shouldn't it successfully edit during the edit phase?

在edit phase,不需要测试val set的performance了吗

pengzju commented 3 months ago

是的,不需要。只需要测试edit set上面的performance就可以

zxlzr commented 3 months ago

Hi, do you have any other questions?