Open domenicrosati opened 2 weeks ago
I am doing a random 500 sample of edits from Counterfact and I am getting the following results. FT for llama2-7b-chat seems off but mostly T5 results seem very bad. Can you check that the T5 evaluation code is working?
[ { "model": "llama2-7b-chat", "method": "ft", "edit_success": 0.306047197640118, "rephrase_acc": 0.41076696165191745, }, { "model": "llama2-7b-chat", "method": "serac", "edit_success": 0.995575221238938, "rephrase_acc": 0.6342182890855457, }, { "model": "t5-small", "method": "ft", "edit_success": 0.052254428610133664, "rephrase_acc": 0.052254428754106234, }, { "model": "t5-small", "method": "serac", "edit_success": 0.01774461055869487, "rephrase_acc": 0.010779436399687582, }, { "model": "gpt2-xl", "method": "ft", "edit_success": 0.9652509652509652, "rephrase_acc": 0.4362934362934363, }, { "model": "gpt2-xl", "method": "serac", "edit_success": 0.9388489208633094, "rephrase_acc": 0.38489208633093525, }, { "model": "gpt2-xl", "method": "memit", "edit_success": 0.8115942028985508, "rephrase_acc": 0.5181159420289855, }, ]
Thank you for your attention to EasyEdit. We will address this issue soon.
I am doing a random 500 sample of edits from Counterfact and I am getting the following results. FT for llama2-7b-chat seems off but mostly T5 results seem very bad. Can you check that the T5 evaluation code is working?