Open AndersEkl opened 1 year ago
Martin, Oscar, and Anders discussed this. Low priority. Possibly the two normalisation ways would be to 1. always preserve the hyphenation. or 2. always remove end-of-line hyphenation and merge the two word parts. More sophisticated ways would require deeper language skills.
We think absolutely 2. always remove end-of-line hyphenation and merge the two word parts.
A third alternative could be to encode end-of-line hyphens as soft hyphens.
A third alternative could be to encode end-of-line hyphens as soft hyphens.
Very good suggestion @josteinaj. In many cases, this will be effectively the same as 2, right? With the addition that the hyphens can be retrievable for processing/checking, since they are disambiguated from other/hard hyhpens.
Yes, they would be disambiguated from other/hard hyphens. It would improve line breaking in a normal e-reader, but also for other formats: a TTS engine should ignore them, and a braille layout engine could use it for hyphenation across lines. It wouldn't be a 100% accurate representation of the original though. Sometimes it really is a hard hyphen, even though it's at the end of the line. But that's a problem also for option 2 when deleting hyphens. No perfect solution here :shrug:.
So, let's add @josteinaj's suggestion as a third option. In my view, this is the best option. Thanks Jostein!
Note to selves: We should check how the spell checkers we use treat soft hyphens.
Originally written by @martinpub
Words split in two across a line break are a common typographic convention. The proper normalisation of these should probably be added to an update of the guidelines.