toptensoftware / RichTextKit

Rich text rendering for SkiaSharp
Other
366 stars 73 forks source link

Hyphenation #21

Closed sebllll closed 4 years ago

sebllll commented 4 years ago

hi all,

i'm currently playing a bit with this api - it's really nice so far!

one thing i want to explore is hyphenation. so far, the line break algorithm works nice and also breaks lines correctly at syllables, when the strings contain soft hyphens. This is already great. so my idea is to prepare the strings and insert soft hyphens based on some dictionary.

a great addition to richtextkit would be to control the hyphen symbol somehow. like: if a line breaks at a soft hyphen, add a given hyphen character(s) at the end of this line

and maybe hyphenation also came to your mind already and there's more opinions about this?

best, sebl

toptensoftware commented 4 years ago

Hi @sebllll,

Thanks for reporting this. Unfortunately this is probably a beyond the scope of this project given my current time constraints. I'd be open to considering pull requests to add the feature, but right now not something I'm planning to develop myself.

The most likely thing I might be able to add is a hook into the line break algorithm that lets someone plugin ability to do hyphenation breaks.

Brad

sebllll commented 4 years ago

thanks for your answer!

i understand that this is not smth. that can be added quickly and am not asking for it.

if you can help me to understand your linebreaker better, i might be able to add functionality towards hyphenation to it. in case of success i'll PR it.

a concrete question is, if the linebreaker makes a difference between hyphens and soft-hyphens? at least i couldn't find a soft-hyphen here: https://github.com/toptensoftware/RichTextKit/blob/master/Topten.RichTextKit/Unicode/LineBreakClass.cs

toptensoftware commented 4 years ago

The line break algorithm is a port of this: https://github.com/foliojs/linebreak and uses the the LineBreakClasse.trie (here) which is generated by the script here to classify characters.

Basically it's an implementation of Unicode Line Breaking Algorithm (UAX #14) - see that spec for exactly what it does.