lvapeab / m4loc

Automatically exported from code.google.com/p/m4loc
GNU Lesser General Public License v3.0
0 stars 0 forks source link

Casing correctness affected when using recaser with traced Moses output #6

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
To reinsert markup we have to have Moses output phrase alignment information 
with the -t option. Example:
this is a |0-2| small |4-4| house |3-3| . |5-5|

The traces (source phrase information between vertical bars) will affect the 
recaser model to reintroduce correct upper/lowercase. The model relies on a 
model based on n-grams.

Workarounds:
*  use truecaser
Possible fixes:
*  remove traces for recasing and reinsert them after

Original issue reported on code.google.com by Achi...@gmail.com on 2 Mar 2011 at 12:15

GoogleCodeExporter commented 9 years ago
Checked in recase_preprocess.pl and recase_postprocess.pl scripts that can be 
used to remove the traces before recasing and reinsert them after.

Original comment by Achi...@gmail.com on 3 Mar 2011 at 1:23

GoogleCodeExporter commented 9 years ago

Original comment by Achi...@gmail.com on 9 Mar 2011 at 3:43