Closed GoogleCodeExporter closed 9 years ago
Fixed in reinsert.pm r116 with result:
Tokenizer <g id="0"> programma have to be , the <g id="1"> <g id="2"> </g>
sadala </g> <g id="3"> </g> ievadīto & $ tekstu teikumos , and the teikumus
vārdos14 . </g>
Not exactly what expected, but the current algorithm cannot:
1. Output opening and closing tags in a specific order before before a phrase,
ie. it cannot output "<g id="1"> </g> <g id="2">". It first outputs all opening
tags, then outputs all closing tags before a phrase, then the phrase, then all
closing tags after a phrase. Note that an order cannot necessarily determined:
the combination of tag pairs around target phrases is different from the
source. If you need strict tag order, you can use an alternative mechanism with
wrap_markup.pm (this also prevents any phrase reordering across markup).
2. Close <g id="2"> after <g id="3">. The former is associated with the phrase
"sadala" only, so it needs to be closed after that phrase. <g id="3"> is
associated with the phrase starting with "ievadīto"
Fixing this further would be a feature request, but it has the problem already
described in 1. and basically tag combinations '<g id="1"> </g>' without a
token in between them should not really happen. These should really be isolated
tags '<x id="1"/>'.
Original comment by Achi...@gmail.com
on 31 Jan 2012 at 11:40
Correction: fixed with reinsert.pm r119
Original comment by Achi...@gmail.com
on 31 Jan 2012 at 11:42
Original issue reported on code.google.com by
Achi...@gmail.com
on 31 Jan 2012 at 11:35