szxiangjn / world-model-for-language-model

118 stars 5 forks source link

Hyperparameters for LLaMA finetuning #8

Open chunhuizng opened 5 months ago

chunhuizng commented 5 months ago

May I ask for the hyperparameters used for LLaMA finetuning? The learning rate, batch size, EWC coefficient (λ), and the rank and coefficient of LoRA will be helpful.

Thank you!

szxiangjn commented 5 months ago

Hi,

It is the same as GPT-J-6B, which you can refer to Appendix A.4.

chunhuizng commented 4 months ago

Hi,

It is the same as GPT-J-6B, which you can refer to Appendix A.4.

Thank you very much and this is helpful!

By the way, would you like to release the fisher matrix of the LLaMA fine-tuning?

Best, Chunhui

chunhuizng commented 4 months ago

Hi, It is the same as GPT-J-6B, which you can refer to Appendix A.4.

Thank you very much and this is helpful!

By the way, would you like to release the fisher matrix of the LLaMA fine-tuning?

Best, Chunhui

As the Fisher matrix of LLaMA is required to reproduce the LLaMA’s EWC fine-tuning in this paper. Thanks!

szxiangjn commented 4 months ago

Sorry it's been a while and I cannot find the Fisher Matrix of LLaMA at the current point... But I uploaded the script for computing fisher matrix, so you can try computing it yourself.