Closed Xiaohui9607 closed 5 months ago
Hi, I am finetuning llama3-8b on the dataset you provide, with 4x8 A100s, I wonder how long does it take on your side to finish the experiment? When finetuning vicuna-7b it only took me 12 hours, but llama3-8b took me 42 hours. Is this the same case on your end? thanks
For LLaMA3, I set the max_length to 6144. You can set this arg to 4096 back to reduce the training time.
Hi, I am finetuning llama3-8b on the dataset you provide, with 4x8 A100s, I wonder how long does it take on your side to finish the experiment? When finetuning vicuna-7b it only took me 12 hours, but llama3-8b took me 42 hours. Is this the same case on your end? thanks