Optimizing GPU Resource Utilization and Memory Allocation During Fine-Tuning on A100-80G

Here's a revised version focusing on the issue of insufficient GPU RAM and resource usage during fine-tuning:

Hi, I’m fine-tuning Whisper-Large-V3 on an A100-80G, but I’m encountering issues with insufficient utilization of GPU RAM and resources. When I run the training, the process only allocates 41GB of GPU RAM, and the GPU utilization fluctuates. It briefly spikes to 100% utilization for 1 or 2 seconds, then drops to near-zero while seemingly waiting before repeating this cycle.

Why isn't the GPU being fully utilized, both in terms of memory and processing power? What steps can I take to optimize the training process to ensure maximum efficiency?

vasistalodagala / whisper-finetune

Optimizing GPU Resource Utilization and Memory Allocation During Fine-Tuning on A100-80G #20