Issue when testing Llama-3-8B-Instruct model

bebr2 commented 3 weeks ago

Hello @sycny ,

Thank you for your valuable contribution to the excellent work!

I am currently trying to test the Llama-3-8B-Instruct model. I have successfully implemented and tested theLlama2 7b chatmodel, and the results are consistent with the paper's reports. However, when I attempted to test the Llama-3-8B-Instruct model, I encountered some issues.

Here are the steps I have taken:

Modified the model loading code in main.py to load the Llama-3-8B-Instruct model.
Ran the testing script.

The problem I am facing is that I get "Fail to Extract" in all items, and the multihop accuracy rate is 0.

I was wondering if there are any other parts of the code that need to be adjusted when switching to the Llama-3-8B-Instruct model, or if this is a common issue that needs to be addressed in a different way.

Any guidance or suggestions would be greatly appreciated.

Once again, thank you for your contribution. Looking forward to your response.

sycny commented 3 weeks ago

Hi, thank you for your interest in our project!

The big difference between LLama2 and LLama3 is that the latter uses an updated tokenizer. In such cases, if you want to estimate the relation probability, You need to figure out which tokens actually represent the relation (a sting of words) you want to estimate.

This is because in some tokenizers, ' citizen of' and 'citizen of' contain different tokens (ids). We tested our method and carefully checked it with the llama 1 and 2 series. You can find our design here: relation_prob

Suppose you want to update to LLama3. I suggest you take a good look at this part and revise it accordingly.

bebr2 commented 1 week ago

Hi @sycny, thanks for the help! I've managed to resolve the issue. Everything's working as expected now. Thank you!

sycny / RAE

Issue when testing Llama-3-8B-Instruct model #1