tloen / alpaca-lora

Instruct-tune LLaMA on consumer hardware
Apache License 2.0
18.66k stars 2.22k forks source link

GPU UTIL fluctuates wildly #584

Open ssocean opened 1 year ago

ssocean commented 1 year ago

Hi, thanks for your great work~

I have run the demo successfully, however, I found that the GPU utilization fluctuates between 0% and 90% rapidly. And only 1 GPU working at the same time... This will significantly increase the total training time. (Now it takes approximately 8 hours on 4*v100)

I'm new to this, but based on my previous experience, this may be caused by the smaller batch size. So I tried to increase the batch size and micro batch size from 128->256, 4->64 respectively. But it doesn't work... image image

Is this normal, or what can I do next?

Shwai-He commented 1 year ago

I met the same problem. Have you solved it?