sile-typesetter / sile

The SILE Typesetter — Simon’s Improved Layout Engine
https://sile-typesetter.org
MIT License
1.65k stars 98 forks source link

Native unicode U+00A9 (soft hyphen) support #1930

Closed Omikhleia closed 9 months ago

Omikhleia commented 9 months ago

In the continuity of #1918, I'd like soft hyphens U+00A9 to be properly supported.

The most direct solution is described in https://github.com/sile-typesetter/sile/discussions/1716#discussioncomment-7740054 (but with other solutions discussed just before).

I'd like to suggest it for inclusion in SILE, with perhaps two additional settings:

Rationale: When copying text from other sources (Office documents, HTML pages), the latter may contain soft hyphens. I met that case when working on my previous book, with inputs from several origins....

The main problem with the current behavior (= no special handling, just passed to the shaper) is twofold:

Moreover, the shaper removes them in ligatures (so they are lost in those cases)... There's some asymmetry here!

These elements would be sufficient for advocating in favor of catching the soft hyphens and replacing them by an appropriate discretionary node.

Still I have other concerns:

So the ability to wholly skip soft hyphens from the input makes sense, because we have a better solution for hyphenation and exceptions.

And the ability to warn about them regardless also make sense, because they are hard to notice (e.g. in VSCode I can obviously have a configuration for showing such characters, but it requires a lot of scrutiny then).

Lastly, if we agree on this proposal, we'd need to document these settings somewhere. Should we have an extra chapter in the manual, such as "Unicode support & special cases" (e.g. just before the chapter concerning language support?).

What do you think?