Closed DonaldTsang closed 5 years ago
Align text to audio with https://github.com/lowerquality/gentle or https://github.com/readbeyond/aeneas and use the result for training.
@nshmyrev and after the alignment (having the timed table), what should I do to make this work?
After alignment you create a preprocessor like https://github.com/r9y9/deepvoice3_pytorch/blob/master/ljspeech.py and start the training.
as @nshmyrev said, you should format your data in the same folder layout as LJSpeech, which has a wavs folder, and a corresponding metadata.csv with each wav name and their text.
Then you can run the preprocessing and training step as if you had ljspeech, and pass the ebook data folder instead.
Here is a comment for gentle and vctk dataset https://github.com/r9y9/deepvoice3_pytorch/tree/master/vctk_preprocess
What is specific for vctk dataset that gentle not working on it?
Seems here is an answer https://github.com/r9y9/deepvoice3_pytorch/issues/16#issuecomment-354480069
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Align text to audio with https://github.com/lowerquality/gentle or https://github.com/readbeyond/aeneas and use the result for training.
After Aligning the files, what should be the format or file structure for preprocessing?
Assume that we already have a person reading a 10 hour e-book, and I have both the ebook and the audiobook, how do I create a dataset for deepvoice3 to train?