Uniscribe & DirectWrite fail to shape GSUB `rlig` due to table type 6

MattMatic commented 7 months ago

Harfbuzz will correctly shape these words, but Uniscribe & DirectWrite fail:

لیڈیوکس
لیرا

(others too!)

After much digging (with Simon Cozens' excellent Crowbar, otf2fea, fontTools ttx, and other tools), it appears the GSUB rlig feature is aborting under Uniscribe when there is a chaining contextual lookup table type 6. Microsoft only specifies lookup type 4 for rlig, reserving type 6 for rclt in Arabic scripts.

On the surface, the error looks like a kerning issue, but from looking at Crowbar, it is GSUB rlig that's not run correctly, and this cascades further through the shaping process (so that GPOS fails, as the glyph IDs are wrong).

Interestingly, Word 365 online renders correctly, but Windows Word desktop uses Uniscribe. I found BabelPad to be a light weight test for Gulzar and Uniscribe. I also tested with Microsoft's DWriteShapePy for DirectWrite - which produced the same results as Uniscribe.

The following results are with Gulzar 1.002, and Gulzar 1.000 has similar issues (though the build process changed in 1.001).

Microsoft References

Harfbuzz Shaping Test

لیرا

U+0644 U+06cc U+0631 U+0627 => Glyphs 49,568,852,691,473:

[{"g":49,"cl":3,"dx":0,"dy":0,"ax":318,"ay":0,"fl":1}
,{"g":568,"cl":2,"dx":0,"dy":0,"ax":2345,"ay":0,"fl":1}
,{"g":852,"cl":1,"dx":-1133,"dy":-608,"ax":0,"ay":0,"fl":1}
,{"g":691,"cl":1,"dx":-1640,"dy":1650,"ax":171,"ay":0,"fl":1}
,{"g":473,"cl":0,"dx":-217,"dy":1719,"ax":1073,"ay":0}
]

Uniscribe & DirectWrite Shaping Test

لیرا

U+0644 U+06cc U+0631 U+0627 => Glyphs 49,568,852,684,461:

ITEM=0 script='arab' chars=4 glyphs=5 rtl=1 lrtl=1 bidi=1
 gn=0   cn=0   gid=  461 ax=-25   00S-- yofs=34   
 gn=1   cn=1   gid=  684 ax=-18   12S-- yofs=26   
 gn=2   cn=1   gid=  852 ax=0     00-DZ xofs=1     yofs=-5   
 gn=3   cn=2   gid=  568 ax=-34   13S--
 gn=4   cn=3   gid=   49 ax=-15   00S--

simoncozens commented 7 months ago

Is this a bug in Gulzar or in Uniscribe?

MattMatic commented 7 months ago

As far as I can tell, it's largely Uniscribe. The MS specs are confusing, but I would say that Gulzar probably should be using rlig for type 4 lookups and rclt for type 6 lookups - but I don't have a quick way to verify this at the moment (it's been days of puzzling so far!), and there doesn't appear to be any font to even begin to compare a 'simple' case.

Harfbuzz happily runs type 4 and 6 under rlig.

Sidenote: Under Harfbuzz apps (Edge, etc), Gulzar is a wonderful feat of engineering. We have been working through a corpus of 73k words (and comparing against Noto Nastaliq Urdu), and Gulzar is superior... but we have about 2.5% issues with collisions etc that we aim to help address.

MattMatic commented 7 months ago

Uniscribe:

Harfbuzz:

simoncozens commented 7 months ago

Gulzar probably should be using rlig for type 4 lookups and rclt for type 6 lookups

Yeah, that sounds like a Uniscribe bug because there should not (despite the insinuations in the OT spec) be an expectation that certain features contain certain lookup types. The problem is that in Harfbuzz rlig runs first and rclt second, so simply moving the lookups into a different feature could break the order.

MattMatic commented 7 months ago

Appreciate the insights ;-) I was just digging into Harfbuzz to check the ordering.

It's such a pain that Word fails so badly because of Uniscribe. And in line this this: https://answers.microsoft.com/en-us/msoffice/forum/all/indic-typography-serious-uniscribe-deficiency/180d28d9-1f22-4cb9-9ac7-cb73d080aa73 - I don't think I can expect a fix from MS quickly, if ever.

MattMatic commented 7 months ago

Although the USE specs says "It is up to the font developer to specify the order of lookups for this set of features", is this something that has been removed over time? (I saw John Hudson's reply about the history to you on a similar subject)

https://learn.microsoft.com/en-us/typography/script-development/use#standard-typographic-presentation-gsub

simoncozens commented 7 months ago

Within a feature you can set the order of lookups. But you can't set the order in which the features are processed. i.e. if you have:

rclt lookup 1, lookup 2, lookup 3
rlig lookup 4, lookup 5, lookup 6

they will still be processed 4, 5, 6, 1, 2, 3.

MattMatic commented 7 months ago

Completely understood. Thank you very much for the confirmation.

MattMatic commented 7 months ago

Update: Just trying another angle...

I used TTX to change the feature in Gulzar 1.002 from rlig to rclt and tried again in my Uniscribe shaping tester. Surprisingly, the result is the same.

So, it seems like it's the GSUB table type 6 that has some kind of issue, and not that it should be rlig + rclt. I'll have to look closer and dig deeper into Uniscribe to try and understand where the fault is.

(Update) Comparing otf2fea and ttx on GSUB, it seems that Gulzar 1.000 only sometimes uses GSUB type 7 extension tables, whereas 1.002 looks like all lookups are wrapped in GSUB type 7 extensions. Given that 1.000 seems to work better, but not perfectly, compared to 1.002 under Uniscribe, I'm beginning to think it might be the extension table lookups that Uniscribe isn't handling the same as Harfbuzz, or perhaps table type 6 inside an extension.

Checking with Crowbar with the two words, rlig rules 28 and 56 are both involved - these have type 6 and type 8 lookups.

Note: I was wrongly making conclusions about the table type from the naming used by otf2fea. The fea output is easier to navigate compared to ttx's xml, and it would've been nice to have comments in the fea output to show the table types and other info from the binary.

MattMatic commented 7 months ago

Update: Gulzar 1.000 and 1.002 are handled completely differently in DirectWrite.

I built a make-shift Python tool to invoke uharfbuzz and DWriteShapePy to compare the gid sequences (ie GSUB only) from the font, comparing our word list. Also used the callback message to keep track of which lookup tables have been invoked to try to narrow down the field of investigation.

But comparing with Gulzar 1.000 gave massively different results.

Of the word in our list that rlig makes a difference to, the count of words that result in the same gid sequence vs count of words that have different sequences:

Version	Same	Diff
1.002	208	62628
1.000	59181	3654

I'm hoping to use the same + different word results from each version to narrow down and understand what Uniscribe & DirectWrite have an issue with. Hope to have more info in about a week.

MattMatic commented 7 months ago

Update: have been doing more comparisons, and found that Noto Nastaliq Urdu (tested v3.007) also has issues with Uniscribe. It's easier to see the rules with Noto, and the Nuktas often disappear. (e.g. with چلنے )

Following through the rules with Crowbar, and uharfbuzz, seems to indicate that Uniscribe is not skipping rules, but somehow missing the matches, and therefore not shaping. There doesn't appear to be much consistency either - have examine the recursive depth of the rules, the location of the tables in the TTF (if there's a range bug), the type of GSUB tables (usually 6.2, 6.3 or 8... but cannot prove conclusively yet).

So far, I haven't found a single cause in Uniscribe (DirectWrite though doesn't behave well with ZWNJ, and ZWJ).

I will keep digging, but without deeper level GSUB debugging tools, and without access to Uniscribe source (!), it's proving extremely challenging. Will keep digging, but trying to shift everything over to Harfbuzz with the expectation that Uniscribe + DirectWrite won't be fixable.

MattMatic commented 5 months ago

I'm closing this issue.

Uniscribe and DirectWrite have shaping errors with Gulzar and Noto Nastaliq Urdu, and it's practically impossible to work out what MS is doing under the hood.

googlefonts / Gulzar