lvapeab / m4loc

Automatically exported from code.google.com/p/m4loc
GNU Lesser General Public License v3.0
0 stars 0 forks source link

reinserter fails #26

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
reinsert
1. perl reinsert.pm f.en < f.moses > f.out

file f.en:
This is <g id="1"> bold and italic and then </g><g id="2"> only italic </g> 
text .

file f.moses:
Toto je |0-1| tučné písmo a kurzívu |2-4| a poté pouze |5-7| kurzívou . 
|8-10|

file f.out:
Toto je <g id="1"> tučné písmo a kurzívu <g id="2"> a poté pouze </g> </g> 
kurzívou .

however, the result should be like:
Toto je <g id="1"> tučné písmo a kurzívu a poté </g><g id="2">  pouze 
kurzívou</g> .

Original issue reported on code.google.com by xhu...@gmail.com on 22 Aug 2011 at 1:37

GoogleCodeExporter commented 9 years ago
As can be seen in the Moses phrase trace information, "a poté pouze" is the 
translation for the tokens "and then only". The algorithm requires that <g 
id="2"> gets placed before the phrase containing "only".

The closing </g> for <g id="1"> needs to be placed after the phrase containing 
"and then". This explains the tag placement under the algorithm constraints 
described in our EAMT 11 paper.

More precise tag placement requires another approach (use of Moses zone/wall 
feature or use of word alignment information).

Original comment by Achi...@gmail.com on 26 Aug 2011 at 9:45