sile-typesetter / sile

The SILE Typesetter — Simon’s Improved Layout Engine
https://sile-typesetter.org
MIT License
1.66k stars 98 forks source link

Complex Latin Script #876

Closed cmahte closed 4 years ago

cmahte commented 4 years ago

Inside of Sile, what triggers the composer to start doing initial, medial, final glyph representation?

If I wanted to setup a latin font for cursive, that uses alternate glyphs like arabic (Initial, medial, final, isolated), what would it take for SILE to recognize it had those and apply the rules.

I'm working under the assumption (and a very little bit of testing) that my attempts to get a latin, cursive font to display initial, medial, and final forms was somehow related to the fact I was calling it 'Latn' script. (and I'm cheating calling the font I was working on 'cursive. its really a footnote caller glyphs font intended to be compatible with limitations in LO and Indesign. But it needs to be treated like a complex script in Latn glyph space, like latin Cursive... and solving this issue for footnotes with rings would improve latin cursive.)

If there were a font with the script id "latc" (complex latin, connected latin, cursive latin, i don't care what it's called), would it help Sile decide to use the contextual alternates table? If a "Latn" font has a 'calt' table, would it get processed or ignored?

Background:

So, There was a request on a typesetting mail list for a font which would enable ringed numbers to be used for footnote markers. Apparently some publishing suites sold by subscription don't have this built in.

I started down the path of designing a font to do this, but ran into the problem that most of the features inside a font I would need were labelled for complex middle eastern scripts. That is, to build a font that draws a ring around 1 or 10 or 100, I planned to use the same isolated, initial, medial, and final glyph method you see in arabic. But it would be a lie to call such a font "arabic" script, and if I label it "Latn", I confuse composers when I try to setup the font with alternate glyphs based on position within a word.

So, I asked The registrar of the BCP47 'scripts' maintainer what would be required to define a new 'script' for Cursive fonts, so that rendering engines wouldn't get confused. The first response was 'that's crazy, you need thousands of glyphs to implement this' (because the maintainer of the scripts table is Unicode consortium, and apparenty my inquiry got routed to the glyphs guy and not the scripts guy."

But the 2nd response was "nobody has asked, you'll need a convincing argument."

So, before I even begin to try to convince anybody, I'm trying to convince myself that something like this is not crazy, but would be useful and desired once it works.

simoncozens commented 4 years ago

Hi @cmahte! The short answer is that this behaviour is defined by the shaper, Harfbuzz. And yes, that only turns on init/medi/fina forms for text which has joining type defined in the Unicode character database, which you're only going to get with a new Unicode script. Realistically that isn't going to be possible.

I wonder if using init/medi/fina forms is the best way to do this. An approach that's often used for ringed numbers in Arabic (such as Quranic end-of-aya marks) is to form ligatures with end-of-aya/number/number/number, end-of-aya/number/number, and end-of-aya/number. (You could even use open-parens/number/close-parens for Latin.) Then either kern or use ligature positioning to put the numbers in the right place. Here's that approach used in the Amiri font: https://github.com/alif-type/amiri/blob/master/sources/enclosing.fea

If you can get the behaviour you want working inside a calt feature, then you can define your footnote marker in SILE to use \font[features=+calt]{...} to turn the feature on.

rjmunro commented 4 years ago

Harfbuzz ... only turns on init/medi/fina forms for text which has joining type defined in the Unicode character database

Is that correct behaviour of Harfbuzz, or an oversight because they didn't think of cases like this?

simoncozens commented 4 years ago

It's correct: https://docs.microsoft.com/en-gb/typography/opentype/spec/features_fj#tag-init

cmahte commented 4 years ago

This inquiry was because the first inquiry I made on this subject (to Unicode Consortium)...

  1. suggested that pretty much all scripts have a cursive form... and all cursives would benefit from the init/medial/final forms. but the resulting explosion of assignments were more than they felt was possible for the code space. So they were actively discouraging new unicode definitions for cursives where isolated glyphs did the job. (Note that isn't a no, just more of a prove yourself, but it probably would still be a no.)

  2. After I explained a 2nd time that i was only ASKING for a script name (like "Latc") in the ISO 15924 registry.. no unicode glyph assignments... and that I was only seeking to make this because it would be a trigger to rendering engines to treat the same points as cursive, he suggested I wasn't asking the right person and pointed me back to teh same form that routed to him. but in that , he suggested it was possible to assign a script ID (no unicode points) so the application would more easily function, IF i had a strong argument.

Simon here seems to say it's not a strong argument. I need to do more homework on calt. I've taken that as a to do, but I have no time to throw at it.

Here's the responses from unicode consortium.

.

cmahte commented 4 years ago

Here are the the responses from the Unicode Consortium.

simoncozens commented 4 years ago

By the way, the OpenType Cookbook has some techniques for "faking" init/medi/fina using boundary detection in OTL rules. The general case of cursive handwriting can be handled this way, without relying on the shaper for joining features - in fact, this is what a lot of script-style fonts use for selecting the appropriate starting and finishing characters: http://opentypecookbook.com/common-techniques/

So the general case of cursive can be handled that way. But I think the specific case of what you are doing with ringed numbers is much easier. Create an empty glyph, a glyph circle.three which is a zero-width glyph contains a circle wide enough to enclose three numbers; a glyph circle.two which is a zero-width glyph contains a circle wide enough to enclose two numbers; and a glyph circle likewise, then use this feature code:

@number = [one two three four five six seven eight nine zero];

lookup deleteParent { sub parenright by emptyglyph; } deleteParent;
lookup ringedThree { sub parentleft by circle.three; } ringedThree;
lookup ringedTwo { sub parentleft by circle.two; } ringedTwo;
lookup ringedOne { sub parentleft by circle.one; } ringedOne;

feature calt {
   sub parenleft' lookup ringedThree @number' @number' @number' parenright' lookup deleteParen;
   sub parenleft' lookup ringedTwo @number' @number' parenright' lookup deleteParen;
   sub parenleft' lookup ringedOne @number'  parenright' lookup deleteParen;
} calt;

You may also want to contextually replace your numbers with tabular forms to make the spacing easier.

But we are a long way from SILE now, so I'm closing this issue...