Open unhammer opened 8 years ago
Ok I'll fix this
Superblanks merged!
You removed the line breaks though. In html, a space vs non-space actually matters (although the number of spaces don't matter outside <pre>
or </code>
), while in TeX and other formats, a double line-break works as a paragraph separator.
Looking at input.html, and comparing with how apertium-deshtml works, I'd expect something like
[<div id="someid">
<p class="some class" id="some id">
][{<i>}]hello brother[
][{<u style="italic">}]how[
][{<b>}]are you[
][{<u style="italic"><em>}]doing?[
<\/p>
<\/div>
]
What I do where structural whitespace matters is to put it into the tag, so that x <p> word </p> y
becomes something like x <p outer-space-before=" " outer-space-after=" " inner-space-before=" " inner-space-after=" "> word </p> y
- then the translation chain can freely mangle whitespace all it likes, because the post-processor can restore the exact spacing.
Dunno if that's at all relevant when superblanks can mostly do the same, but might help with some formats.
Yeah, that sounds like a safe way to do it, though I think we should also build the rest of the chain so it still doesn't change plain spaces outside superblanks unless it's meaningful for translation.
current README example gives
This should be
Ie.
This may seem a bit arbitrary, but we want as few as possible unnecessary differences from the current apertium-deshtml, to make integration as easy as possible.