Closed jiqimaoke closed 2 months ago
After replacing the key files of MEND with corresponding files in the rome repository, the results returned to normal.
Hello, we apologize for any inconvenience caused. We are currently busy with the deadline for our paper, and have not had enough manpower to handle things in the past few days. We will address it in a few days. In the meantime, you could try adjusting the hyperparameters of the GPT-J-6B model, as our yaml file is not optimized for GPT-J-6B.
Additionally, have you reproduced the n-gram entropy on llama-2-7B
? It is possible that MEND itself causes repeated tokens (like We We We We We), leading to reduced diversity.
- Could you please specify which scripts you mean by "key files"?
I list below the key files I replaced previously.
You can find the corresponding files in the following folders.
Additionally, have you reproduced the n-gram entropy on
llama-2-7B
? It is possible that MEND itself causes repeated tokens (like We We We We We), leading to reduced diversity.
I have encountered this situation, but it is a rare case. The n-gram entropy of Llama-2-7b is still between 550-600. Based on the n-gram entropy results from other papers, maybe it is an anomalous phenomenon.
I think I understand your issue. I'll reproduce it on GPT-J as soon as possible.
Currently, my guess is that the cause might be too many iterations, leading to the MEND hypernetwork overfitting the cross-entropy of tokens (resulting in very high probabilities for certain token IDs), which reduces the generation capability. You could try setting max_iters
to 20000. I'm also following up on this issue.
We tried setting max_iters
to 20000, and the average n-gram entropy of gpt-j-6b is around 450.
Hi, do you have any further questions?
No further questions. Thx!
I tried to reproduce the result of MEND in gpt-j-6B and Llama-2-7b, but the ngram-entropy of gpt-j-6B is far below Llama-2-7b(gpt-j-6B around 350 vs Llama-2-7b around 550). Do you have any ideas?
Here is my training code:
My training yaml:
My eval script: