zjunlp / EasyEdit

[知识编辑] [ACL 2024] An Easy-to-use Knowledge Editing Framework for LLMs.
https://zjunlp.github.io/project/KnowEdit
MIT License
1.63k stars 200 forks source link

GRACE sequential edit result #325

Open SXxinxiaosong opened 2 weeks ago

SXxinxiaosong commented 2 weeks ago

Hello, I am now playing with GRACE on llama2 recently but I've noticed a significant difference in the results between the default settings and the WISE paper's result. Could you please help me check the issue? Thank you very much!

hparams:

alg_name: "GRACE"
model_name: "/home/xsong/llama/llama-2-7b-chat"
device: 1

inner_params:
- model.layers[27].mlp.down_proj.weight

edit_lr: 1.0
n_iter: 50
eps: 1.0
dist_fn: euc # euc, mmd, cos
val_init: cold # cold, warm
val_train: sgd # sgd, pert
val_reg: None # early
reg: early_stop # early_stop
replacement: replace_last # replace_last, replace_all, replace_prompt
eps_expand: coverage # , moving_avg, decay
num_pert: 8 # only matters when using perturbation training
dropout: 0.0

code:

    import json

    edit_data = json.load(open('/home/xsong/EditRAG/data/benchmark/ZsRE/ZsRE-test-all.json', 'r', encoding='utf-8'))
    prompts = [edit_data_['prompt'] for edit_data_ in edit_data if edit_data_['target_new'].strip()]
    rephrase_prompts = [edit_data_['rephrase_prompt'] for edit_data_ in edit_data if edit_data_['target_new'].strip()]
    target_new = [edit_data_['target_new'] for edit_data_ in edit_data if edit_data_['target_new'].strip()]
    ground_truth = [edit_data_['ground_truth'] for edit_data_ in edit_data if edit_data_['target_new'].strip()]
    subject = [edit_data_['subject'] for edit_data_ in edit_data if edit_data_['target_new'].strip()]
    hparams = GraceHyperParams.from_hparams('/home/xsong/EasyEdit_old/hparams/GRACE/llama-7B.yaml')
    print(hparams)

    editor = BaseEditor.from_hparams(hparams)
    metrics, edited_model, _ = editor.edit(
        prompts=prompts,
        rephrase_prompts=rephrase_prompts,
        target_new=target_new,
        subject=subject,
        ground_truth=ground_truth,
        #locality_inputs=locality_inputs,
        #portability_inputs=portability_inputs,
        #train_ds=train_ds,
        sequential_edit=False
    )

result: Metrics Summary: {'pre': {'rewrite_acc': 0.002814684674792284, 'rephrase_acc': 0.003032677033445673}, 'post': {'rewrite_acc': 0.3871004187971905, 'rephrase_acc': 0.005825400579435937}}

XeeKee commented 2 weeks ago

Thank you for your interest in EasyEdit. We will reproduce this result and resolve the issue in the near future.

SXxinxiaosong commented 1 week ago

您好,我又实验了一次,'post': {'rewrite_acc'} 仍然是保持在0.38左右。 同时,在查看代码过程中,发现GRACE的evaluate function中test_prediction_acc(model, tok, hparams, prompt, target_new, device, vanilla_generation=True) vanilla_generation设置为True。其他方法是基于target_new输出,GRACE使用model.generate()输出。将vanilla_generation改为False之后,GRACE的结果是: Metrics Summary: {'pre': {'rewrite_acc': 0.37488205598313207, 'rephrase_acc': 0.37201854107119287, 'portability': {'one_hop_acc': 0.4772354551500425}}, 'post': {'rewrite_acc': 0.39502925654924115, 'rephrase_acc': 0.37201854107119287, 'locality': {'Relation_Specificity_acc': 1.0}, 'portability': {'one_hop_acc': 0.4772354551500425}}} rewrite_acc保持在0.39左右,但是rephrase_acc显著提高了。 请问GRACE单独将vanilla_generation设置为Ture的原因是什么呢?

XeeKee commented 1 week ago

vanilla_generation设置为Ture是因为GRACE使用了一个adapter,调用model.generate()会将key_id 设置为-1 ,以确保token_to_edit 能被正确设置。

zxlzr commented 1 week ago

Hi, do you have any further questions?

SXxinxiaosong commented 1 week ago

嗯嗯 第一个复现问题还没有解决,请先不要关闭~

SXxinxiaosong commented 1 week ago

vanilla_generation设置为Ture是因为GRACE使用了一个adapter,调用model.generate()会将key_id 设置为-1 ,以确保token_to_edit 能被正确设置。

def compute_portability_quality(
    model,
    model_name,
    hparams: HyperParams,
    tok: AutoTokenizer,
    portability_key: str,
    prompt: typing.Union[str, List[str]],
    ground_truth: typing.Union[str, List[str]],
    device,
) -> typing.Dict:

    if 't5' in model_name.lower():
        portability_correct = test_seq2seq_batch_prediction_acc(model, tok, hparams, prompt, ground_truth, device)
    else:
        portability_correct = test_prediction_acc(model, tok, hparams, prompt, ground_truth, device)

    ret = {
        f"{portability_key}_acc": portability_correct
    }
    return ret

请问计算portability时,为什么没有将GRACE单独处理呢?

littlefive5 commented 1 week ago

It‘s a bug here and we would fix it.

pengzju commented 1 week ago

I have fixed it

SXxinxiaosong commented 2 days ago

可以reopen一下吗,复现的GRACE的sequential edit的结果和论文有较大出入~

pengzju commented 1 day ago

339 . 他这边和你遇到了相似的问题,但是在他的实验中复现了GRACE的High Edit Success。你可以参考这里的脚本/讨论,来检查你使用Grace的方式是否存在问题。

pengzju commented 1 day ago

在我的实验环境中,能够轻松复现Grace的结果。

image

如果你始终无法复现实验结果或对方法本身有更深刻的问题,我建议你联系GRACE原作者。