marytts / marytts

MARY TTS -- an open-source, multilingual text-to-speech synthesis system written in pure java
https://marytts.github.io/
Other
2.35k stars 737 forks source link

How to improve letter to phoneme transcription #434

Closed aWelker closed 4 years ago

aWelker commented 8 years ago

Hi all,

I have a lexicon with about 170,000 words transcribed and ran the LTSLexiconPOSBuilder on it. However, the prediction of unknown as well as known words is not working properly. For example, for the word "aanvoeren" ('anvur@n in the lexicon), the marytts-client gives me the following output:

<t g2p_method="rules" ph="? a - ' a n - f o: 6 - R @ n" pos="APPR"> aanvoeren </t>

And this is what I get for "dichtgespijkerde" (d'Ixtx@spEk@rd@ ,also in the lexicon): <t g2p_method="rules" ph="' d I C t - g @ S - p I j - k 6 - d @" pos="VAFIN"> dichtgespijkerde </t>

Is there any way to look into the prediction files and change some things manually? (like, "ij" should always transcribe to "E" but almost never do; in the audios they're even produced more like [aI]) Somehow I couldn't manage to run transcript.sh properly because the option "save" is not selectable, could that be the reason for my problem?

Thanks in advance for any help!

Rootex commented 8 years ago

Hello, so we guess you are training for dutch, have you checked if everything is ok in your phoneset file? Could you please post it so we can check it.

aWelker commented 8 years ago

Hi, yes, it's supposed to be a Dutch voice. I understand you mean the allophone file? It looks like this:

<allophones name="sampa" xml:lang="nl"
        features="vlng vheight vfront vrnd ctype cplace cvox">
    <silence ph="_"/>

    <vowel ph="i" vlng="l" vheight="1" vfront="1" vrnd="-"/>
    <vowel ph="e" vlng="l" vheight="2" vfront="1" vrnd="-"/>
    <vowel ph="E" vlng="s" vheight="2" vfront="1" vrnd="-"/>
    <vowel ph="E:" vlng="l" vheight="2" vfront="1" vrnd="-"/>
    <vowel ph="a" vlng="l" vheight="3" vfront="2" vrnd="-"/>
    <vowel ph="O" vlng="s" vheight="2" vfront="3" vrnd="+"/>
    <vowel ph="O:" vlng="l" vheight="2" vfront="3" vrnd="+"/>
    <vowel ph="o" vlng="l" vheight="2" vfront="3" vrnd="+"/>
    <vowel ph="u" vlng="s" vheight="1" vfront="3" vrnd="+"/>
    <vowel ph="y" vlng="l" vheight="1" vfront="2" vrnd="+"/>
    <vowel ph="2" vlng="s" vheight="2" vfront="2" vrnd="+"/>
    <vowel ph="@" vlng="a" vheight="2" vfront="2" vrnd="-"/>

    <vowel ph="I" vlng="s" vheight="1" vfront="1" vrnd="-"/>
    <vowel ph="Y" vlng="s" vheight="1" vfront="2" vrnd="+"/>
    <vowel ph="A" vlng="s" vheight="3" vfront="3" vrnd="-"/>

    <vowel ph="EI" vlng="d" vheight="2" vfront="1" vrnd="-"/>
    <vowel ph="aU" vlng="d" vheight="3" vfront="2" vrnd="+"/>

    <vowel ph="3" vlng="l" vheight="1" vfront="1" vrnd="-"/>
    <vowel ph="1" vlng="s" vheight="3" vfront="2" vrnd="-"/>

    <consonant ph="p" ctype="s" cplace="l" cvox="-"/>
    <consonant ph="t" ctype="s" cplace="a" cvox="-"/>
    <consonant ph="k" ctype="s" cplace="v" cvox="-"/>
    <consonant ph="b" ctype="s" cplace="l" cvox="+"/>
    <consonant ph="d" ctype="s" cplace="a" cvox="+"/>
    <consonant ph="g" ctype="s" cplace="v" cvox="+"/>
    <consonant ph="f" ctype="f" cplace="b" cvox="-"/>
    <consonant ph="s" ctype="f" cplace="a" cvox="-"/>
    <consonant ph="S" ctype="f" cplace="p" cvox="-"/>
    <consonant ph="v" ctype="f" cplace="b" cvox="+"/>
    <consonant ph="z" ctype="f" cplace="a" cvox="+"/>
    <consonant ph="Z" ctype="f" cplace="p" cvox="-"/>
    <consonant ph="l" ctype="l" cplace="a" cvox="+"/>
    <consonant ph="m" ctype="n" cplace="l" cvox="+"/>
    <consonant ph="n" ctype="n" cplace="a" cvox="+"/>
    <consonant ph="J" ctype="n" cplace="p" cvox="+"/>
    <consonant ph="j" ctype="r" cplace="p" cvox="+"/>
    <consonant ph="h" ctype="f" cplace="g" cvox="-"/>
    <consonant ph="H" ctype="r" cplace="l" cvox="+"/>

    <consonant ph="N" ctype="n" cplace="v" cvox="+"/>
    <consonant ph="x" ctype="f" cplace="u" cvox="-"/>
    <consonant ph="r" ctype="r" cplace="a" cvox="+"/>

    <consonant ph="P" ctype="f" cplace="p" cvox="+"/>
    <consonant ph="G" ctype="r" cplace="v" cvox="+"/>
    <consonant ph="w" ctype="r" cplace="l" cvox="+"/>

    <consonant ph="?" ctype="s" cplace="g" cvox="-"/>
</allophones>
AstonBraham commented 8 years ago

Hi everyone,

Please which tool can I use to generate a new allophone.xml file like the one above ?

aWelker commented 8 years ago

Hi, There is no such tool as far as I know. I just copied the French allophone file, as it already had most of the phones I needed, and added the missing ones manually. You can either get them from other existing allophone files or "invent" them on your own, just need to insert the phone and phonetic specifications into one pair of <>.

AstonBraham commented 8 years ago

Thank you a lot @aWelker Please is there a kind of documentation that explains the meaning of : ctype cplace cvox

? I can't seem to understand them

aWelker commented 8 years ago

You're welcome Imagination and phonetics class ;-)

ctype: fricative (f), nasal (n), plosive (s), ... (just try to find phonemes of the same articulation type in the existing files)

cplace: bilabial (b), alveolar (a), ...

cvox: voiced (+), not voiced (-)

It's basically the ipa chart put in xml, I don't know if there somewhere exists a documentation that sums up the abbreviations

AstonBraham commented 8 years ago

Thank you a lot Sir @aWelker !!! :+1: I have spent almost 3 days trying to figure it out.. :-)

nberisha commented 8 years ago

Hello aWelker!

Just wandering if you did find solution for this issue?

Regards. Naim

aWelker commented 8 years ago

Hi nberisha,

No, sadly this still is an issue

nshmyrev commented 8 years ago

There are open source g2p toolkits which provide state of the art accuracy for g2p conversion. You'd better use them instead of openmary implementation which is pretty simplistic.

nberisha commented 8 years ago

Hi nshmyrev

Thanks for your hint! Could you guide us for an open source g2p toolkit that you would recommend?

Thanks in advance!

tibistibi commented 8 years ago

sorry for this blunt post but @aWelker i would like to get in touch with you. i'm looking into a dutch voice myself. maybe nice to share :smile:

tibor@strausz.nl

nberisha commented 8 years ago

Hi,

I did following modification in method "predictPronunciation" of TrainedLTS class and fixed the issue for my language.

                                            *  *  *
StringAndFloatLeafNode leaf = (StringAndFloatLeafNode) tree.interpretToNode(fv, 0);
        String prediction = leaf.mostProbableString(featureDefinition, indexPredictedFeature);

        String letter = (i < 0 || i >= graphemes.length()) ? "null" : graphemes.substring(i, i + 1);

boolean lastCharInGrapheme = graphemes.lastIndexOf(letter) == graphemes.length()-1;

        if ((letter.equalsIgnoreCase("d")) {
                if ("dh".equalsIgnoreCase(graphemes.substring(i, i+2))) {
                    prediction = "D";
                    i++;
                    returnStr += prediction;
                }
                else {
                    returnStr += prediction.substring(1, prediction.length() - 1);
                }
        } else {
            returnStr += prediction.substring(1, prediction.length() - 1);
        }

                                 * * *

I hope will be of use by other as well.

Best Naim

entenbein commented 8 years ago

@nberisha For training LTS/G2P rules I'd recommend https://www-i6.informatik.rwth-aachen.de/web/Software/g2p.html Works really well!