lvapeab / m4loc

Automatically exported from code.google.com/p/m4loc
GNU Lesser General Public License v3.0
0 stars 0 forks source link

Differing translations between word-/phrase-alignment and tag fixed methods even for continuous phrases #48

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Translating line 2 of Sample_AlmostEverything_1.2_strict.xlf.en-us results in 
two different translations:
Word-based:
<x id="1"/> Se interpone para <x id="2"/> .
Tag-fixed:
<x id="1"/> tribuna para <x id="2"/> .

Is this due to just the punctuation being included or not in the phrase 
translation?

Check by translating raw text.

Original issue reported on code.google.com by Achi...@gmail.com on 23 Sep 2013 at 8:09

GoogleCodeExporter commented 9 years ago
Source: <mrk mtype="protected"> XLIFF </mrk> stands for <mrk mtype="protected"> 
XML Localisation Interchange File Format </mrk> .

Why do the <mrk><mrk/> tag pairs get translated to isolated tags?

echo "stands for" | moses -f binarized_model/moses.ini
Translation: se interpone para

echo "stands for ." | moses -f binarized_model/moses.ini
Translation: se interpone para .

Original comment by Achi...@gmail.com on 24 Sep 2013 at 1:52

GoogleCodeExporter commented 9 years ago
Correction: source with latest Okapi extraction is:
<x id="1"/>stands for <x id="2"/>.

Original comment by Achi...@gmail.com on 24 Sep 2013 at 3:00

GoogleCodeExporter commented 9 years ago
Source after tokenization:
<x id="1"/> stands for <x id="2"/> .
Source after markup wrapping:
<wall/><np translation="<x id="1"/>"><x id="1"/></np><wall/> stands for 
<wall/><np translation="<x id="2"/>"><x id="2"/></np><wall/> .
Raw Moses translation:
<x id="1"/> tribuna para <x id="2"/> .
Unescaped Moses translation:
<x id="1"/> tribuna para <x id="2"/> .

The cause of the problem here is probably that the decoder, even with the 
insertion of walls between the phrase "stands for" and the markup, picks a 
different translation than when presented with the isolated phrase.

This is outside of the control of the M4Loc project. Different translations 
between tag fixed handling and word-/phrase-based tag insertion methods are 
expected anyway.

Original comment by Achi...@gmail.com on 24 Sep 2013 at 3:17