lvapeab / m4loc

Automatically exported from code.google.com/p/m4loc
GNU Lesser General Public License v3.0
0 stars 0 forks source link

mod_tokenizer fails on inline elements on Windows #1

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Environment: Windows 7 64-bit, Strawberry Perl 5.10.1.4 32-bit, Okapi Tools M10

What steps will reproduce the problem?
1. Download XLIFF test file 
http://okapi.googlecode.com/svn/trunk/okapi/filters/xliff/src/test/resources/RB-
11-Test01.xlf 
2. Open a command window and change directory to where the downloaded file is 
located
3. Extract translatable text with "tikal -xm RB-11-Test01.xlf"
4. Try to tokenize extracted text "perl mod_tokenizer.pl -l en < 
RB-11-Test01.xlf.en > RB-11-Test01.tok.en"

Result:
Error message: "The system cannot find the path specified."
The contents of the file RB-11-Test01.tok.en are (in 2 lines):
 -n "Paragraph. <g id="1">code</g>  <g id="2">bold</g> -n ". <x id="1"/> -n " and more text <g id="2"><x id="1"/></g> -n " and more text. 
 <x id="1"/> -n "

Expected:
No error message and the output file to contain properly tokenized text.

Remark:
Removing XLIFF inline elements from the input file removes the error message, 
but does not fix the corrupted output.

Original issue reported on code.google.com by Achi...@gmail.com on 25 Jan 2011 at 7:33

GoogleCodeExporter commented 9 years ago
It seems as some OS specific problem. Since Moses is running on Linux-based 
systems, also the prototype of mod_tokenizer is done and was tested for this 
OS. 

Please, try it in Linux - should be working. 

On next conference call, some topic can be related whether an which OSs should 
be supported by our project. In my opinion, for this prototype, linux is 
enough, later on it can be broaden also to windows,...

Original comment by xhu...@gmail.com on 26 Jan 2011 at 9:58

GoogleCodeExporter commented 9 years ago
With the latest check-ins the tokenizer works in the scenario described above.

Original comment by Achi...@gmail.com on 24 Feb 2011 at 10:11