lvapeab / m4loc

Automatically exported from code.google.com/p/m4loc
GNU Lesser General Public License v3.0
0 stars 0 forks source link

Sequence <g id="2"><x id="1"/></g> does not get tokenized #10

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
1. Open a command window and change into the .\xliff directory
2. Run "xliff2moses.bat .\t\RB-12-Test02.xlf.tok en" (or "./xliff2moses.bat 
./t/RB-12-Test02.xlf.tok en" on Unix) 
3. Open .\t\RB-12-Test02.xlf.tok.en in a text editor
4. View line 7:
"Text with <g id="2"><x id="1"/></g> and more text ."

Expected:
"Text with <g id="2"> <x id="1"/> </g> and more text ."

Original issue reported on code.google.com by Achi...@gmail.com on 3 Mar 2011 at 1:56

GoogleCodeExporter commented 9 years ago
I believe this is desired behaviour - no extra space anywhere since it was in 
input file (this is true for non-tokenizable parts (URL, tags))

Original comment by xhu...@gmail.com on 3 Mar 2011 at 9:13

GoogleCodeExporter commented 9 years ago

Original comment by Achi...@gmail.com on 3 Mar 2011 at 12:26

GoogleCodeExporter commented 9 years ago
Markup reinserter can deal with this, but it will insert spaces inbetween the 
different tags in the output.

Original comment by Achi...@gmail.com on 3 Mar 2011 at 4:04

GoogleCodeExporter commented 9 years ago

Original comment by Achi...@gmail.com on 9 Mar 2011 at 3:47