Unusually good locality scores for MEND?

MichaelRipa commented 3 months ago

Hi,

I have been evaluating different model editing methods on a custom made benchmark, and so far, things have been seemingly consistent. However, when I run MEND, I am always getting locality scores of 0, indicating it is near perfect at avoiding modifying out of scope facts. Other edit techniques (e.g. ROME) do not have such drastically good scores on this dataset, thus I am worrying it is something incorrect in either how I trained the meta-learner or set up the edits.

Is there a particular reason why MEND specifically would have such good locality scores compared to other edit techniques or do you reckon that there is a fault in my setup? I tried downloading your pretrained MEND meta-learner weights and it had the same behaviour as my meta-learner I trained on CounterFact. Any thoughts or suggestions would be apprechiated.

I can provide more explicit details where needed.

Thanks!

pengzju commented 3 months ago

A higher locality metric is better, and I believe there are no issues with your training process.
If you observe that the locality is indeed performing well, that is normal because, in the early stages of editing (when the number of edits is 1), MEND exhibits good locality. However, it collapses after multiple edits. (https://arxiv.org/abs/2311.04661, https://arxiv.org/abs/2402.10987)

pengzju commented 3 months ago

If all your issues have been resolved, please help close this issue.

zjunlp / EasyEdit

Unusually good locality scores for MEND? #259