marytts-it / marytts

MARY TTS -- an open-source, multilingual text-to-speech synthesis system written in pure java
http://mary.dfki.de
Other
1 stars 3 forks source link

POS tagger CheckUp for Italian #20

Open ftesser opened 11 years ago

ftesser commented 11 years ago

To Check some strange behavior of Italian POS tagger:

The following two sentences:

S#1: quanti sono? S#2: Quanti sono?

differs only for the capitalization of the first char, and it gives totally different results:

S#1:

<?xml version="1.0" encoding="UTF-8"?>
<maryxml xmlns="http://mary.dfki.de/2002/MaryXML" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="0.5" xml:lang="it">
<p>
<s>
<phrase>
<t accent="H+L*" g2p_method="lexicon" ph="' k w a1 n - t i" pos="B" pos_full="B">
quanti
</t>
<t accent="H+L*" g2p_method="lexicon" ph="' s O1 - n o" pos="V" pos_full="Vip3p">
sono
</t>
<t pos="FS" pos_full="FS">
?
</t>
<boundary breakindex="5" tone="L-H%"/>
</phrase>
</s>
</p>
</maryxml>
</pre>

S#2:

<?xml version="1.0" encoding="UTF-8"?>
<maryxml xmlns="http://mary.dfki.de/2002/MaryXML" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="0.5" xml:lang="it">
<p>
<s>
<phrase>
<t g2p_method="lexicon" ph="' k w a1 n - t i" pos="DQ" pos_full="DQmp">
Quanti
</t>
<t g2p_method="lexicon" ph="' s O1 - n o" pos="VA" pos_full="VAip3p">
sono
</t>
<t pos="FS" pos_full="FS">
?
</t>
<boundary breakindex="5" tone="L-H%"/>
</phrase>
</s>
</p>
</maryxml>