yeslogic / allsorts

Font parser, shaping engine, and subsetter implemented in Rust
https://yeslogic.com/blog/allsorts-rust-font-shaping-engine/
Apache License 2.0
706 stars 23 forks source link

Incorrect glyph order with Bengali #84

Open floppyhammer opened 1 year ago

floppyhammer commented 1 year ago

Test text: ওহে বিশ্ব!

Glyph indexes (allsorts): 1497 1530 1539 3 1521 1533 1527 1543 1521 4

image

Glyph indexes (rustybuzz): 1497 1539 1530 3 1533 1521 1527 1543 1521 4

image

We can see there's a bit off with allsorts.

Used config:

let script = tag::BENG;
let dir = TextDirection::LeftToRight;
let lang = Some(tag::BENG);
mikeday commented 1 year ago

Which font?

floppyhammer commented 1 year ago

Arial Unicode MS Font.ttf

I have also tested with NotoSansBengali-Regular.ttf, and the result is correct. It seems the issue is font related.

wezm commented 1 year ago

It looks like the font doesn't have gsub rules for Bengali. Rustybuzz must be picking/falling back to a different script.

allsorts layout-features Arial\ Unicode\ MS\ Font.ttf    
Table: GSUB
  Script: arab
    Language: default
      Feature: isol
        Lookups: 0
      Feature: init
        Lookups: 1
      Feature: medi
        Lookups: 2
      Feature: fina
        Lookups: 3
      Feature: liga
        Lookups: 4,5,6
    Language: FAR 
      Feature: isol
        Lookups: 0
      Feature: init
        Lookups: 1
      Feature: medi
        Lookups: 2
      Feature: fina
        Lookups: 3
      Feature: liga
        Lookups: 4,5,6
      Feature: isol
        Lookups: 7
      Feature: fina
        Lookups: 8
      Feature: locl
        Lookups: 9
    Language: URD 
      Feature: isol
        Lookups: 0
      Feature: init
        Lookups: 1
      Feature: medi
        Lookups: 2
      Feature: fina
        Lookups: 3
      Feature: liga
        Lookups: 4,5,6
      Feature: isol
        Lookups: 10
      Feature: init
        Lookups: 11
      Feature: medi
        Lookups: 12
      Feature: fina
        Lookups: 13
      Feature: locl
        Lookups: 14
  Script: deva
    Language: default
      Feature: nukt
        Lookups: 15
      Feature: akhn
        Lookups: 16
      Feature: rphf
        Lookups: 17
      Feature: blwf
        Lookups: 18
      Feature: half
        Lookups: 19
      Feature: vatu
        Lookups: 20,21
      Feature: pres
        Lookups: 22,23,24,26
      Feature: abvs
        Lookups: 29,30,31,32
      Feature: blws
        Lookups: 39,40,42
      Feature: psts
        Lookups: 44
      Feature: haln
        Lookups: 46
  Script: gujr
    Language: default
      Feature: nukt
        Lookups: 59
      Feature: akhn
        Lookups: 60
      Feature: rphf
        Lookups: 61
      Feature: blwf
        Lookups: 62
      Feature: half
        Lookups: 63
      Feature: vatu
        Lookups: 64,65
      Feature: pres
        Lookups: 66,67,68,70,72
      Feature: abvs
        Lookups: 74,75,76,81
      Feature: blws
        Lookups: 84
      Feature: psts
        Lookups: 85
      Feature: haln
        Lookups: 86
  Script: guru
    Language: default
      Feature: nukt
        Lookups: 47
      Feature: blwf
        Lookups: 48
      Feature: half
        Lookups: 49
      Feature: pstf
        Lookups: 50
      Feature: blws
        Lookups: 51,55
      Feature: abvs
        Lookups: 56,57
  Script: hani
    Language: default
      Feature: salt
        Lookups: 108
      Feature: trad
        Lookups: 109
      Feature: smpl
        Lookups: 110
      Feature: vert
        Lookups: 111
    Language: JAN 
      Feature: vert
        Lookups: 111
    Language: KOR 
      Feature: locl
        Lookups: 108
      Feature: vert
        Lookups: 111
    Language: ZHS 
      Feature: locl
        Lookups: 110
      Feature: vert
        Lookups: 111
    Language: ZHT 
      Feature: locl
        Lookups: 109
      Feature: vert
        Lookups: 111
  Script: kana
    Language: default
      Feature: vert
        Lookups: 111
    Language: JAN 
      Feature: vert
        Lookups: 111
  Script: knda
    Language: default
      Feature: akhn
        Lookups: 94
      Feature: rphf
        Lookups: 95
      Feature: blwf
        Lookups: 96
      Feature: half
        Lookups: 97
      Feature: blws
        Lookups: 98
      Feature: abvs
        Lookups: 99
      Feature: psts
        Lookups: 102,104,105,106
      Feature: haln
        Lookups: 97
  Script: taml
    Language: default
      Feature: akhn
        Lookups: 87
      Feature: half
        Lookups: 88
      Feature: abvs
        Lookups: 89,90
      Feature: psts
        Lookups: 91,92
      Feature: haln
        Lookups: 88
Table: GPOS
  Script: arab
    Language: default
      Feature: mark
        Lookups: 21,22,23,24
  Script: deva
    Language: default
      Feature: abvm
        Lookups: 0
      Feature: blwm
        Lookups: 1
      Feature: dist
        Lookups: 2
  Script: gujr
    Language: default
      Feature: abvm
        Lookups: 8
      Feature: blwm
        Lookups: 9
      Feature: dist
        Lookups: 10
  Script: guru
    Language: default
      Feature: abvm
        Lookups: 6
      Feature: blwm
        Lookups: 7
  Script: knda
    Language: default
      Feature: dist
        Lookups: 15,17,19
  Script: taml
    Language: default
adrianwong commented 1 year ago

It appears that the differences come down to the left matras:

being ordered incorrectly in the output.

I've been away from shaping code for a while now, but my suspicion is that HarfBuzz and/or rustybuzz perform some of the initial reordering required for shaping Indic text, whereas Allsorts bails early if the font's GSUB table doesn't contain the expected script:

https://github.com/yeslogic/allsorts/blob/1d05ffa243d857770b45d2b0244bb2d747520bd4/src/scripts/indic.rs#L1106-L1112