yl4579 / StyleTTS

Official Implementation of StyleTTS
MIT License
396 stars 64 forks source link

batch size and number of epochs for large dataset #54

Closed auspicious3000 closed 1 year ago

auspicious3000 commented 1 year ago

Hi,

Thanks for releasing the code of such a great work.

Could you kindly clarify the specific number of epochs and batch size you employed during the LibriTTS model training?

The paper says batch size of 64 and 200 epochs for 1st stage training and 100 epochs for 2nd stage training. The config.yml indicates the same except for a batch size of 32. The config.yml under the pretrained "Models" directory indicates 80 epochs for the 1st stage and 50 epochs for the 2nd stage for LibriTTS. These disparities have raised questions regarding the precise training parameters used to achieve the results presented in the paper for LibriTTS. Training for 200 epochs with a batch size of 64 on the LibriTTS dataset can be quite time-consuming without access to a powerful GPU. It would be very helpful if a reduced number of epochs and batch size could achieve comparable results.

Looking forward to your reply! @yl4579

Warm regards

yl4579 commented 1 year ago

I haven't updated the paper with the camera-ready version so it is indeed quite unclear. These parameters are only for LJSpeech dataset instead of LibriTTS, as you have pointed out it's indeed too time-consuming to train it for 200 epochs (and no improvements are seen after around 80 epochs). You should refer to the parameters of the released checkpoints instead (though they aren't the exact checkpoints I used for the paper).