High Computation Cost - Githubissues

jinzhuoran / RWKU

RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language Models. NeurIPS 2024

https://rwku-bench.github.io

56 stars 4 forks source link

High Computation Cost #2

Closed zhmzm closed 3 months ago

zhmzm commented 3 months ago

Hi, Thanks for sharing the impressive code!

The computation cost of this repo is higher than expected. As LLaMA-Factory suggested, a 7B model would only require 60GB GPU memory for full fine-tune. However, it requires about 160 GB when I run full/run_ga.sh.

Which step increases the cost?

jinzhuoran commented 3 months ago

Hi @zhmzm,

Thanks for your attention! I recently used LLaMA-Factory to train LLaMA3-8B using gradient descent, and I encountered an OOM error even with batch_size=1. This issue might be due to the training framework, as I made only minimal changes to LLaMA-Factory for the GA method.

I recommend using two A100 GPUs for training GA and RT, and three to four A100 GPUs for training DPO and NPO.

zhmzm commented 3 months ago

Thanks for your reply. I encountered a problem that I can use LLaMA-Factory sft a larger model but I failed to train it with GA in this repo. I will check the code.

jinzhuoran commented 3 months ago

What is the average length of your sft data?

zhmzm commented 3 months ago

I set the length to 1024 in LLaMA-Factory and batch size to 1, while in this repo I set the length to 128 and the same batch size. Are there other hyper-parameters I should consider as well? And do you know if the repo includes retain sets for training? Or does it only calculate the loss on forget sets?

jinzhuoran commented 3 months ago

It's strange, I didn't change the original codebase of LLaMA-Factory. I just uploaded a script to inject the knowledge via the pretrain loss in the original LLaMA-Factory, you can try to see if it works.

I didn't provide the retain corpus for training. But I plan to add retain loss and corpus in the future. You can choose some other famous people outside RWKU as retain targets, or sample some texts from wikitext.

zhmzm commented 3 months ago

Thanks for your assistance!