mbutterick / pollen-users

please use https://forums.matthewbutterick.com/c/typesetting/ instead
https://forums.matthewbutterick.com/c/typesetting/
52 stars 0 forks source link

[quad] word segmentation and zero-width space #35

Open sorawee opened 4 years ago

sorawee commented 4 years ago

TLDR: is there zero-width space in quad?

In some non-English languages such as Thai, there are no word boundaries. Particularly, whitespace is not a word boundary, but it is a sentence separator. This causes quad to enter a new line only at a new sentence, which is not optimal. The problem is described with more technical details here.

There are existing tools that help with this problem, notably Swath. So my current solution is to traverse the document tree and replace each string with outputs from Swath. However, I need zero-width space to glue these segmented words together. Is there a way to input it?

More generally, I want to ask if this is a good approach. I understand that quad is meant to be low-level, so the word segmentation problem might not be suitable to resolve at this level. Yet IMO, it also doesn't make sense to leave the problem, which is quite low-level, to users.

mbutterick commented 4 years ago

IIUC the general & correct answer is to implement the Unicode linebreaking algorithm, which respects the zero-width space. (I started a version of it over here, but have not made progress. If someone wanted a manageable, self-contained contribution to Quad, that would be a good one.)

In the interim you can add the zero-width space to the list of softies in the line wrapper and see if it does what you want — if so you can make a PR.