rhasspy / piper

A fast, local neural text to speech system
https://rhasspy.github.io/piper-samples/
MIT License
6.73k stars 494 forks source link

High VRAM Usage with Larger Datasets in Piper Compared to Coqui TTS #571

Closed mniiinm closed 3 months ago

mniiinm commented 3 months ago

Hello, and thanks for this great project. I have a relatively large dataset that includes over 300,000 audio files. While training the model using Piper, I noticed that the larger the dataset, the more VRAM is consumed. For example, if I use a dataset with 20,000 audio files and set the --batch-size to 32 and --max-phoneme-ids to 400, the VRAM usage is under 24GB. However, when I use the dataset with 300,000 files, the VRAM usage increases to over 40GB. I didn’t experience this issue when using Coqui TTS, where the VRAM usage was independent of the dataset size. Am I doing something wrong here?

mniiinm commented 3 months ago

I had made a mistake and the problem was with my dataset.