Closed winman3000 closed 4 years ago
Pavoque data is available here:
https://github.com/marytts/pavoque-data
You can also consider Voxforge data for training:
http://www.voxforge.org/de/Downloads
There are three voices with more than 10 hours of data each - ralfherzog, manu and guenter, each can be a source of good German voice
Thanks!
Are these voices already marked or should I cut the files?
There is no such thing as "marked" in TTS training. The data linked is ready for training, it is segmented on sentences and has transcription, you just need to follow training process.
@nshmyrev that transcription - is it the mapping between audio and text that I called "marks"?
Yes
@nshmyrev which of the files in http://www.repository.voxforge1.org/downloads/de/Trunk/Audio/ contain transcription? Is it somehow embedded in FLAC?
Transcription for every utterance in every archive is inside etc/PROMPTS
file
@nshmyrev but etc/PROMPTS
contains only phrases - there are no annotations on word borders. Is it intentional? I thought that HMM based synthesis needs more fine-grained annotations for audio than just separate phrases.
ralfherzog-20080131-de71/mfc/de71-67 DAS IST EINE REINE KATASTROPHE
ralfherzog-20080131-de71/mfc/de71-68 DADURCH WIRD GELD IN DEN AKTIENMARKT GELENKT
ralfherzog-20080131-de71/mfc/de71-69 DIE GROßE KOALITION GIBT ES SCHON LANGE
No, this transcription is sufficient to train HMM voice, phonetic segmentation is created automatically, this is stage 2- Run the EHMMlabeler to label automatically the wav files using the corresponding transcriptions. from https://github.com/marytts/marytts/wiki/HMMVoiceCreation
I would like to have the Training bases for German voices and if possible the wave files too. But I Need the marked, finished wave files as the Training bases, so I have an own example how to create a German HTS based voice.