Here's a revised version focusing on the issue of insufficient GPU RAM and resource usage during fine-tuning:
Hi, I’m fine-tuning Whisper-Large-V3 on an A100-80G, but I’m encountering issues with insufficient utilization of GPU RAM and resources. When I run the training, the process only allocates 41GB of GPU RAM, and the GPU utilization fluctuates. It briefly spikes to 100% utilization for 1 or 2 seconds, then drops to near-zero while seemingly waiting before repeating this cycle.
Why isn't the GPU being fully utilized, both in terms of memory and processing power?
What steps can I take to optimize the training process to ensure maximum efficiency?
Here's a revised version focusing on the issue of insufficient GPU RAM and resource usage during fine-tuning:
Hi, I’m fine-tuning Whisper-Large-V3 on an A100-80G, but I’m encountering issues with insufficient utilization of GPU RAM and resources. When I run the training, the process only allocates 41GB of GPU RAM, and the GPU utilization fluctuates. It briefly spikes to 100% utilization for 1 or 2 seconds, then drops to near-zero while seemingly waiting before repeating this cycle.
Why isn't the GPU being fully utilized, both in terms of memory and processing power? What steps can I take to optimize the training process to ensure maximum efficiency?