rhasspy / larynx

End to end text to speech system using gruut and onnx
MIT License
824 stars 49 forks source link

Siwis good training on bad prompts #19

Open ddavout opened 3 years ago

ddavout commented 3 years ago

in Siwis, the talent rarely respects the pronunciation of verbs in conditional mode for example, she would say "il tirait" instead of "il tirerait " .. so

despite the correct phonemes

`DEBUG:larynx:Words for 'il tirerait le premier.': ['il', 'tirerait', 'le', 'premier', '.']
DEBUG:larynx:Phonemes for 'il tirerait le premier.': ['#', 'i', 'l', '#', 't', 'i', 'ʁ', 'ə', 'ʁ', 'ɛ', '#', 'l', 'ə', '#', 'p', 'ʁ', 'ə', 'm', 'j`

I can hear "il tirait le premier".

synesthesiam commented 3 years ago

Is there enough of a pattern that we could automate some prompt corrections and re-train?

ddavout commented 3 years ago

I have to compare the prompts I use with the original.. How many prompts do you need, you think ?

ddavout commented 3 years ago

in parl, there are 4 occurrences of "rerai" the first 3 are affected text/part1/neut_parl_s01_0429.txt: A défaut, je suggérerai à l’Assemblée de le rejeter.

text/part1/neut_parl_s02_0531.txt: Cela représente, pour ceux qui l’ignoreraient, plus de deux fois le salaire moyen.

text/part1/neut_parl_s02_0589.txt: Si le travail continue de cette manière, je me retirerai moi aussi.

text/part1/neut_parl_s03_0372.txt: S’il nous rejoint, je retirerai mon amendement.

the only correct is text/part1/neut_parl_s04_0597.txt: Je les rencontrerai prochainement, probablement

synesthesiam commented 3 years ago

I use Siwis as the "base" model for French, since it's one where I had the most data available. So any corrections to the transcripts will improve it and all of the downstream models when I re-train.

Should I create a repo to share the corrected transcripts, or would you like to do that?

Also, thanks for your effort :)

ddavout commented 3 years ago

I have notice quite a lot of problems of "reading". For my voice I've just changed the prompts .. and yes it improved my voice particularly when the defaults are repeated, of course other example "erion" on 9 occurrences I found, 3 are wrong

text/part1/neut_parl_s01_0633.txt: gagnerions vs gagnerons Nous gagnerions beaucoup à examiner ce qui est pratiqué là-bas.

text/part1/neut_parl_s03_0462.txt: oserions vs oserons Nous n’oserions pas, quant à nous, porter de telles accusations.

text/part1/neut_parl_s03_0622.txt: Sans eux, nous ne serions pas là aujourd’hui, quoi que l’on pense, quoi que l’on dise.

text/part1/neut_parl_s04_0310.txt: Nous souhaiterions savoir comment on peut faire.

text/part1/neut_parl_s04_0378.txt: Je ne vois d’ailleurs pas comment nous le ferions…

text/part1/neut_parl_s06_0096.txt: Certes, notre pays ne va pas aussi bien que nous le souhaiterions.

text/part1/neut_parl_s06_0666.txt: y is read as e (SAMPA) Je crois que nous y gagnerions tous.

text/part2/neut_book_s06_0092.txt: – Pourquoi serions-nous malades, puisqu’il n’y a pas de médecins dans l’île ? répondit très sérieusement Pencroff.

text/part3/emph_parl_s01_0633.txt: gagnerions vs gagnerons Nous gagnerions BEAUCOUP à examiner ce qui est pratiqué là-bas.

a repo is a good idea, right now I am putting a lot effort to chase all these imperfections, .. contrary to you who are helped by larynx, I am obliged to track more subtle differences (as with a festival lexicon, 1 (word, POS) corresponds to 1 entry in the lexicon) and I need to take in account every liaison she makes compulsory, optional or completely wrong There are parts that are not read at all (at least ... one part between parenthesis) and ... they are waves files are not good enough (in my mind) for Festival (particularly I would say badly truncated ones with a script not suitable for French Phoneset ... It's my "feeling" but one fact is here : the sound i + k is very weak .. I look the waves with Praat and I am with time more and more selective ... but that's another problem

synesthesiam commented 3 years ago

If it would help you out, I have the prompt alignments too. I trained a French Kaldi model on these same IPA phonemes, and used the alignments in the training labels and to trim the WAV files.