marytts / marytts

MARY TTS -- an open-source, multilingual text-to-speech synthesis system written in pure java
https://marytts.github.io/
Other
2.35k stars 737 forks source link

Training Bases #440

Closed winman3000 closed 4 years ago

winman3000 commented 8 years ago

I would like to have the Training bases for German voices and if possible the wave files too. But I Need the marked, finished wave files as the Training bases, so I have an own example how to create a German HTS based voice.

nshmyrev commented 8 years ago

Pavoque data is available here:

https://github.com/marytts/pavoque-data

You can also consider Voxforge data for training:

http://www.voxforge.org/de/Downloads

There are three voices with more than 10 hours of data each - ralfherzog, manu and guenter, each can be a source of good German voice

winman3000 commented 8 years ago

Thanks!

Are these voices already marked or should I cut the files?

nshmyrev commented 8 years ago

There is no such thing as "marked" in TTS training. The data linked is ready for training, it is segmented on sentences and has transcription, you just need to follow training process.

abitrolly commented 8 years ago

@nshmyrev that transcription - is it the mapping between audio and text that I called "marks"?

nshmyrev commented 8 years ago

Yes

abitrolly commented 8 years ago

@nshmyrev which of the files in http://www.repository.voxforge1.org/downloads/de/Trunk/Audio/ contain transcription? Is it somehow embedded in FLAC?

nshmyrev commented 8 years ago

Transcription for every utterance in every archive is inside etc/PROMPTS file

abitrolly commented 8 years ago

@nshmyrev but etc/PROMPTS contains only phrases - there are no annotations on word borders. Is it intentional? I thought that HMM based synthesis needs more fine-grained annotations for audio than just separate phrases.

ralfherzog-20080131-de71/mfc/de71-67 DAS IST EINE REINE KATASTROPHE
ralfherzog-20080131-de71/mfc/de71-68 DADURCH WIRD GELD IN DEN AKTIENMARKT GELENKT
ralfherzog-20080131-de71/mfc/de71-69 DIE GROßE KOALITION GIBT ES SCHON LANGE
nshmyrev commented 8 years ago

No, this transcription is sufficient to train HMM voice, phonetic segmentation is created automatically, this is stage 2- Run the EHMMlabeler to label automatically the wav files using the corresponding transcriptions. from https://github.com/marytts/marytts/wiki/HMMVoiceCreation