r9y9 / deepvoice3_pytorch

PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models
https://r9y9.github.io/deepvoice3_pytorch/
Other
1.97k stars 485 forks source link

Questions regarding dataset creation #111

Closed DonaldTsang closed 5 years ago

DonaldTsang commented 6 years ago

Assume that we already have a person reading a 10 hour e-book, and I have both the ebook and the audiobook, how do I create a dataset for deepvoice3 to train?

nshmyrev commented 6 years ago

Align text to audio with https://github.com/lowerquality/gentle or https://github.com/readbeyond/aeneas and use the result for training.

DonaldTsang commented 6 years ago

@nshmyrev and after the alignment (having the timed table), what should I do to make this work?

nshmyrev commented 6 years ago

After alignment you create a preprocessor like https://github.com/r9y9/deepvoice3_pytorch/blob/master/ljspeech.py and start the training.

G-Wang commented 6 years ago

as @nshmyrev said, you should format your data in the same folder layout as LJSpeech, which has a wavs folder, and a corresponding metadata.csv with each wav name and their text.

Then you can run the preprocessing and training step as if you had ljspeech, and pass the ebook data folder instead.

mrgloom commented 5 years ago

Here is a comment for gentle and vctk dataset https://github.com/r9y9/deepvoice3_pytorch/tree/master/vctk_preprocess

What is specific for vctk dataset that gentle not working on it?

mrgloom commented 5 years ago

Seems here is an answer https://github.com/r9y9/deepvoice3_pytorch/issues/16#issuecomment-354480069

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

AbuKashan commented 1 year ago

Align text to audio with https://github.com/lowerquality/gentle or https://github.com/readbeyond/aeneas and use the result for training.

After Aligning the files, what should be the format or file structure for preprocessing?