sbsdev / sbs-braille-tables

Auxiliary tables used at SBS to generate good German Braille using liblouis
GNU Lesser General Public License v3.0
3 stars 0 forks source link

Make tables more "white space aware" #1

Closed bertfrees closed 5 years ago

bertfrees commented 9 years ago

@egli The problem is the following. The mechanism I use for preserving white space is based on segmentation of the input, replacing significant white space segments with a NBSP character, tracing of these NBSP segments in the output back to the input, and restoring the white space segments if needed. But the accuracy of this mechanism relies on the liblouis table. The table should preserve NBSP characters (unless it has a good reason to delete them), and in addition it needs to support the segmentation (i.e. the input/output position mapping should be accurate).

I have a test where I want to translate the string 3. die Mittelsenkrechte auf der Strecke  ⠷⠘⠉⠙⠾  im Punkt Q. The space after the "3." and the spaces before and after the unicode braille string need to be preserved (they are NBSP but you can't see that in this Github issue). As you can see in the following warning message, all of the NBSP segments were lost:

WARN Text segmentation was lost in the output. Falling back to fuzzy mode.
=> input segments: ["3.", " ", "die Mittelsenkrechte auf der Strecke", "  ", "⠷⠘⠉⠙⠾", "  ", "im Punkt Q."]
=> output segments: ["⠼⠉⠄", "⠙⠊⠑ ⠍⠊⠞⠞⠑⠇⠎⠑⠝⠅⠗⠑⠉⠓⠞⠑ ⠁⠥⠋ ⠙⠑⠗ ⠎⠞⠗⠑⠉⠅⠑", "⠷⠘⠉⠙⠾", "⠊⠍ ⠏⠥⠝⠅⠞ ⠟⠄"]
WARN White space was lost in the output.
bertfrees commented 9 years ago

This pass2 rule in sbs-special.mod might have something to do with it:

# Kürzungsverbot entfernen
pass2 @a ?
bertfrees commented 9 years ago

@egli Could you find out why this rule is there, and if possible replace it with something else?

egli commented 9 years ago

Ah, this is probably because we add this "Kürzungsverbot" in the XSLT in some places to inhibit contraction.

bertfrees commented 9 years ago

Oh I see. And in the XSLT that special sign is U+250A, right? In sbs-special.cti there is a rule that says

letter \x250A a

The problem is that in compileTranslationTable.c there is a rule that says

space \x00A0 a

and this is also what I rely on. Virtual dot "a" should really be reserved for NBSP, otherwise things break.

So it seems the solution is simply to find another virtual dot pattern for the Kürzingsverbot.

egli commented 9 years ago

Yes, probably makes sense. I seem to remember that Christian Waldvogel complained that there weren't enough virtual dots. I'll have to look at it with him or Mischa

bertfrees commented 9 years ago

There are plenty of virtual dots patterns. 6 virtual dots (9, a, b, c, d and e) which means (2^6 - 1) * 2^8 = 16128 virtual dot patterns.

egli commented 9 years ago

This should be fixed in the pipeline2 branch

bertfrees commented 9 years ago

I couldn't build it, had to add a fixup (81e955f), I hope I got it right. The white space problem seems to be solved though. Thanks! Will you release a new version soon?