vasistalodagala / whisper-finetune

Fine-tune and evaluate Whisper models for Automatic Speech Recognition (ASR) on custom datasets or datasets from huggingface.
MIT License
218 stars 53 forks source link

Optimizing GPU Resource Utilization and Memory Allocation During Fine-Tuning on A100-80G #20

Open cod3r0k opened 3 weeks ago

cod3r0k commented 3 weeks ago

Here's a revised version focusing on the issue of insufficient GPU RAM and resource usage during fine-tuning:

Hi, I’m fine-tuning Whisper-Large-V3 on an A100-80G, but I’m encountering issues with insufficient utilization of GPU RAM and resources. When I run the training, the process only allocates 41GB of GPU RAM, and the GPU utilization fluctuates. It briefly spikes to 100% utilization for 1 or 2 seconds, then drops to near-zero while seemingly waiting before repeating this cycle.

Why isn't the GPU being fully utilized, both in terms of memory and processing power? What steps can I take to optimize the training process to ensure maximum efficiency?