GPU UTIL fluctuates wildly

Hi, thanks for your great work~

I have run the demo successfully, however, I found that the GPU utilization fluctuates between 0% and 90% rapidly. And only 1 GPU working at the same time... This will significantly increase the total training time. (Now it takes approximately 8 hours on 4*v100)

I'm new to this, but based on my previous experience, this may be caused by the smaller batch size. So I tried to increase the batch size and micro batch size from 128->256, 4->64 respectively. But it doesn't work...

Is this normal, or what can I do next?

tloen / alpaca-lora

GPU UTIL fluctuates wildly #584