lmnt-com / wavegrad

A fast, high-quality neural vocoder.
Apache License 2.0
276 stars 45 forks source link

Dataset and train test split for pretrained model #6

Closed twoleggedeye closed 3 years ago

twoleggedeye commented 3 years ago

Hi! What dataset did you use for training the 24khz model? Also, what train/test split did you use for training?

I am having a poor samples quality on LJSpeech inference with the model (I resampled the dataset beforehand), so I assume this model was not trained on it?

sharvil commented 3 years ago

The 24kHz model was trained on the VCTK corpus with a random (but stable) 87.5% train/12.5% test split. The audio samples have a few examples from the test set. You may also be interested in DiffWave which follows a similar architecture as WaveGrad but, in my experience, produces better results.

twoleggedeye commented 3 years ago

Great, thank you! Could you please provide the exact file list for your split of VCTK? I want to perform some experiments with wavegrad, and it would be very convenient to compare it with your implementation as a reference

sharvil commented 3 years ago

I can provide the procedure for generating the train/test split:

Consider each (text_file, wav_file) pair as a record. For each record, read the text file as a UTF-8 string. Remove leading and trailing whitespace from the text. Generate a SHA1 hash of the text, and consider the 3 least-significant bits of the most-significant byte of the hash. If they're all 0, this record belongs in the test set; otherwise it belongs in the train set.