Open rezame opened 5 years ago
Sir,
Multi-speakers model has been abandoned for me. You need 10~15 hour single speaker corpus where good evaluation results can be based.
rezame notifications@github.com 于2019年1月1日周二 下午11:21写道:
Hi I want to implement a tts for Farsi (Persian) language. Is it possible to describe, properties of the TTS dataset? I have dataset of speech recognition with 200 speakers and about 100 hours. Is it good dataset for TTS ? I think that can train model with that dataset and adapt on a one speaker, how many hours for adaptation? best regards
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/keithito/tacotron/issues/242, or mute the thread https://github.com/notifications/unsubscribe-auth/AFwKQnnKnsShusnfGIURB2mzwvMZ_1z3ks5u-3zhgaJpZM4Zl7ss .
Hi thanks I used Kaldi to segment audiobook and alignment phones, now I have wave and text and phone alignment. 1- my dataset contains 15 hours waves and 3500 waves with text transcripts and phone alignments. It means waves are about 15 seconds in length. is it good? or segments them? (dataset growing up to 30 hours) 1-1- and my dataset doesn't have any stress,... labels. does it achieve good results or must label these? 2- Is it possible to describe how can use this toolkit?
Hi I want to implement a tts for Farsi (Persian) language. Is it possible to describe, properties of the TTS dataset? I have dataset of speech recognition with 200 speakers and about 100 hours. Is it good dataset for TTS ? I think that can train model with that dataset and adapt on a one speaker, how many hours for adaptation? best regards