philschmid / sagemaker-huggingface-llama-2-samples

86 stars 32 forks source link

Fine Tuning LLAMA2-13b on ml.g4dn.12xlarge is taking too much time. #15

Closed hz-nm closed 1 year ago

hz-nm commented 1 year ago

So I was fine tuning LLAMA2 13B on a different dataset. I used the code tweaked it a little just to preprocess that specific dataset. Then I ran it via SageMaker training job. The training was running great but it was very slow and even after 24 hours it only managed to go up to 7% on ml.g4dn.12xlarge instance. Can anyone please guide me how I can increase the speed of training. Unfortunately I cannot use "ml.g5.4xlarge" since that training instance is not available in the region I am working with right now. Thanks.

philschmid commented 1 year ago

g4 instances are very old and not build for training. You should rather try p3 then.

hz-nm commented 1 year ago

Thank you for replying I will try p3 instances and inform back on the results so that you can update the chart as well. Again many thanks.

hz-nm commented 1 year ago

Will the p3.2xlarge be sufficient for training and merging or its GPU Memory is too little? Here are the specs, VCPUs: 8 Instance Memory: 61 GiB Total GPU Memory: 16 GB

p3.8xlarge has 4 GPUs and a similar increment in Instance Memory.

philschmid commented 1 year ago

Not sure, with peft and int-4 maybe but using a instance with more GPU would be better.

hz-nm commented 1 year ago

Thanks again. I will update here soon.