Open ShuoZhangXJTU opened 1 year ago
Hi @ShuoZhangXJTU !
Sorry for the evaluation issue! The bug is that the MQuAKE-T dataset we released before didn't contain the extended pre-edit gold answers (Appendix E in our updated version). This will cause a much lower performance of the base model due to the time mismatch of the training corpus and our Wikidata dump.
I have updated the dataset and MQuAKE-T includes a new field answer_extended
which we used in our experiments. You should also use this filed for evaluating the base model before editing.
For FT results: we use the same hyperparameters as MEMIT did.
Hi Zexuan,
Thank you for your update! I will use that latest version then.
Best regards,
Shuo
-----原始郵件----- 發件人:"Zexuan Zhong" @.> 發送時間:2023-11-27 05:53:06 (星期一) 收件人: princeton-nlp/MQuAKE @.> 抄送: "Shuo Zhang" @.>, Mention @.> 主題: Re: [princeton-nlp/MQuAKE] Can you release the codes for evaluation and training hyperparameters? (Issue #6)
Hi @ShuoZhangXJTU !
Sorry for the evaluation issue! The bug is that the MQuAKE-T dataset we released before didn't contain the extended pre-edit gold answers (Appendix E in our updated version). This will cause a much lower performance of the base model due to the time mismatch of the training corpus and our Wikidata dump.
I have updated the dataset and MQuAKE-T includes a new field answer_extended which we used in our experiments. You should also use this filed for evaluating the base model before editing.
For FT results: we use the same hyperparameters as MEMIT did.
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>
I am trying to reproduce the results on MQUAKE-T and found the multihop results for "Base" are way less (16.22/22.59 for multihop and cot) than reported in Table 4. And I can not reproduce the FT results either.
Can you release your codes on evaluation and your own training hyperparameters for reproduction?