Closed GoogleCodeExporter closed 9 years ago
Source: <mrk mtype="protected"> XLIFF </mrk> stands for <mrk mtype="protected">
XML Localisation Interchange File Format </mrk> .
Why do the <mrk><mrk/> tag pairs get translated to isolated tags?
echo "stands for" | moses -f binarized_model/moses.ini
Translation: se interpone para
echo "stands for ." | moses -f binarized_model/moses.ini
Translation: se interpone para .
Original comment by Achi...@gmail.com
on 24 Sep 2013 at 1:52
Correction: source with latest Okapi extraction is:
<x id="1"/>stands for <x id="2"/>.
Original comment by Achi...@gmail.com
on 24 Sep 2013 at 3:00
Source after tokenization:
<x id="1"/> stands for <x id="2"/> .
Source after markup wrapping:
<wall/><np translation="<x id="1"/>"><x id="1"/></np><wall/> stands for
<wall/><np translation="<x id="2"/>"><x id="2"/></np><wall/> .
Raw Moses translation:
<x id="1"/> tribuna para <x id="2"/> .
Unescaped Moses translation:
<x id="1"/> tribuna para <x id="2"/> .
The cause of the problem here is probably that the decoder, even with the
insertion of walls between the phrase "stands for" and the markup, picks a
different translation than when presented with the isolated phrase.
This is outside of the control of the M4Loc project. Different translations
between tag fixed handling and word-/phrase-based tag insertion methods are
expected anyway.
Original comment by Achi...@gmail.com
on 24 Sep 2013 at 3:17
Original issue reported on code.google.com by
Achi...@gmail.com
on 23 Sep 2013 at 8:09