Closed twoleggedeye closed 3 years ago
The 24kHz model was trained on the VCTK corpus with a random (but stable) 87.5% train/12.5% test split. The audio samples have a few examples from the test set. You may also be interested in DiffWave which follows a similar architecture as WaveGrad but, in my experience, produces better results.
Great, thank you! Could you please provide the exact file list for your split of VCTK? I want to perform some experiments with wavegrad, and it would be very convenient to compare it with your implementation as a reference
I can provide the procedure for generating the train/test split:
Consider each (text_file, wav_file)
pair as a record.
For each record, read the text file as a UTF-8 string.
Remove leading and trailing whitespace from the text.
Generate a SHA1 hash of the text, and consider the 3 least-significant bits of the most-significant byte of the hash.
If they're all 0, this record belongs in the test set; otherwise it belongs in the train set.
Hi! What dataset did you use for training the 24khz model? Also, what train/test split did you use for training?
I am having a poor samples quality on LJSpeech inference with the model (I resampled the dataset beforehand), so I assume this model was not trained on it?