High VRAM Usage with Larger Datasets in Piper Compared to Coqui TTS

Hello, and thanks for this great project. I have a relatively large dataset that includes over 300,000 audio files. While training the model using Piper, I noticed that the larger the dataset, the more VRAM is consumed. For example, if I use a dataset with 20,000 audio files and set the --batch-size to 32 and --max-phoneme-ids to 400, the VRAM usage is under 24GB. However, when I use the dataset with 300,000 files, the VRAM usage increases to over 40GB. I didn’t experience this issue when using Coqui TTS, where the VRAM usage was independent of the dataset size. Am I doing something wrong here?

rhasspy / piper

High VRAM Usage with Larger Datasets in Piper Compared to Coqui TTS #571