yl4579 / StyleTTS2

StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
MIT License
4.78k stars 391 forks source link

During training, the graphics memory has been continuously increasing #242

Open Wentao795 opened 4 months ago

Wentao795 commented 4 months ago

hi,Thank you very much for your work. May I ask how to solve the problem of continuous growth of graphics memory during training

martinambrus commented 4 weeks ago

GPU memory (VRAM) should not increase past a certain point. You can control how much VRAM is used with 2 settings in the config.yml file (Configs folder):

1) batch_size - this must never go below 2 and is the main control of memory consumption. The higher this number, the faster the training but the more GPU VRAM will be used (since training will try to batch X times more data into each pass).

2) max_len - this setting will tell the script how much of your WAV files data to process. The value is in frames and calculated as YOUR_MAX_WAV_FILE_LENGTH_IN_SECONDS / 0.0125 (i.e. if your longest WAV file is 10 seconds long, max_len will be 10 / 0.0125 = 800). Bear in mind that if you set this lower, only that amount of all your 10 second audio files will be considered for training. This may result in the model learning partial words or sentences, so it's generally advised to set max_len value to the full duration of your longest WAV files you're using for training.

If, after these changes, your VRAM is still being consumed continuously, then there might be a bug somewhere that's doing that.