marytts-it / marytts

MARY TTS -- an open-source, multilingual text-to-speech synthesis system written in pure java
http://mary.dfki.de
Other
1 stars 3 forks source link

Some English tokens tokenized too much #15

Closed ftesser closed 11 years ago

ftesser commented 12 years ago

There are difference between marytts-it and the upstream version for word like wouldn't:

Old version:

<mtu orig="wouldn't">
<t g2p_method="lexicon" ph="' w U d n t" pos="MD">
wouldnt
<syllable ph="w U d n t" stress="1">
<ph p="w"/>
<ph p="U"/>
<ph p="d"/>
<ph p="n"/>
<ph p="t"/>
</syllable>
</t>
</mtu>

New version:

<t g2p_method="lexicon" ph="' w U d" pos="MD">
would
<syllable ph="w U d" stress="1">
<ph p="w"/>
<ph p="U"/>
<ph p="d"/>
</syllable>
</t>
<mtu orig="n't">
<t g2p_method="lexicon" ph="' E n" pos="RB">
n
<syllable ph="E n" stress="1">
<ph p="E"/>
<ph p="n"/>
</syllable>
</t>
<t g2p_method="lexicon" ph="' t i" pos="RB">
t
<syllable ph="t i" stress="1">
<ph p="t"/>
<ph p="i"/>
</syllable>
</t>
</mtu>

Moreover seem that also:

An example below:

Old version:

<mtu orig="Don't">
<t accent="L+H*" g2p_method="lexicon" ph="' d @U n t" pos="NN">
dont
<syllable accent="L+H*" ph="d @U n t" stress="1">
<ph p="d"/>
<ph p="@U"/>
<ph p="n"/>
<ph p="t"/>
</syllable>
</t>
</mtu>

New version:

<t g2p_method="lexicon" ph="' d u" pos="VBP">
Do
<syllable ph="d u" stress="1">
<ph p="d"/>
<ph p="u"/>
</syllable>
</t>
<mtu orig="n't">
<t g2p_method="lexicon" ph="' E n" pos="RB">
n
<syllable ph="E n" stress="1">
<ph p="E"/>
<ph p="n"/>
</syllable>
</t>
<t accent="L+H*" g2p_method="lexicon" ph="' t i" pos="VB">
t
<syllable accent="L+H*" ph="t i" stress="1">
<ph p="t"/>
<ph p="i"/>
</syllable>
</t>
</mtu>

The origin of the this difference should be the new lexicon method that find the segment of a word in the dictionary splitted by some chars (i.e. ')

ftesser commented 12 years ago

Attention! it seems a problem of difference between 5.0 and 4.3

ftesser commented 12 years ago

Attention! it seems a problem of class-path order.

See commit c40f0a77967bd277debc35cac181843bbe6b00a8

The overwriting of jtok in /marytts-lang-en/src/main/resources/jtok/en/ is successful only if marytts-lang-.jar has a higher priority to respect marytts-server.jar.