michal-h21 / luatex-harfbuzz-shaper

Experimental text shaping in LuaTeX using Harfbuzz library
10 stars 0 forks source link

The shaper does not handle x_offset and y_offset values from Harfbuzz #2

Closed deepakjois closed 8 years ago

deepakjois commented 8 years ago

Harfbuzz shaper output has a couple of fields for every glyph in its output buffer – x_offset and y_offset. This is currently not being taken into account in the luatex node processing callbacks. This probably does not matter for latin fonts (and some non-latin fonts like Amiri), i.e. it is 0. But it can produce incorrect output for many cases.

I tested with typesetting Arabic with a Nastaliq font (Noto Nastaliq Urdu), and the output wasn’t even visible on the page.

Here is a simple hello-world Harfbuzz example that will clarify the concept of x_offset and y_offset: https://github.com/behdad/harfbuzz-tutorial/blob/master/hello-harfbuzz.c

michal-h21 commented 8 years ago

I've created new fontloader, it is based on luaotfload, seeluaotfloadbranch. The usage is shown in [newsample.tex](https://github.com/michal-h21/luatex-harfbuzz-shaper/blob/luaotfload/examples/newsample.tex), This branch should be used with current version ofjustenoughharfbuzz.so`.

Some older features doesn't work yet, namely feature setting in the document body, new font must be declared for every feature combination at the moment. On the other hand, hyphenation works now (except for ligatures), and fonts can be loaded from both the system and TeX tree.

Regarding x_offset, it doesn't seem to be included in table returned by justenughharfbuzz!s shaper (at least not fot Noto font), so we can't work with it unfortunately :( There are only glyphAdvance and width, which sometimes differ, but not with Arabic fonts I tested.

There is a strange thing about Noto Nastaliq Urdu, as I can't get it to work correctly. It is shown in the output, but it appears similarly wrong with harfbuzz on and off, only some glyphs seems to be decomposed. Other fonts I tested (Amiri and Sheherazade) seems to work correctly. I have no idea what's wrong here, it probably needs some specific combination of script, language and features, harfbuzz is processing it wrongly otherwise, it seems.

deepakjois commented 8 years ago

I will check out your changes soon.

Are you sure about x_offset and y_offset not being returned?

Here it is being returned in an older version of justenoughharfbuzz.c (the one on my fork of your code) https://github.com/deepakjois/luatex-harfbuzz-shaper/blob/master/justenoughharfbuzz.c#L329

And here it is on the latest HEAD at SILE: https://github.com/simoncozens/sile/blob/master/src/justenoughharfbuzz.c#L186

In fact, processing those is crucial for Noto Nastaliq.

Moreover, Noto Nastaliq is shaped differently, because the Nastaliq script has a slanted baseline in practise. So the dots and the shapes are moved around based on the context.

You can understand it in detail if you use the hb-shape tool that comes with Harfbuzz.

Here is an example of shaping a simple Urdu word یہ with Harfbuzz, using both Amiri and Noto Nastaliq:

Amiri

hb-shape --output-format=json  ~/Library/Fonts/amiri-regular.ttf "یہ" | json_pp
[
   {
      "cl" : 1,
      "dx" : 0,
      "ay" : 0,
      "dy" : 0,
      "g" : "uni06C1.fina",
      "ax" : 836
   },
   {
      "ax" : 390,
      "g" : "uni06CC.init",
      "cl" : 0,
      "dy" : 0,
      "ay" : 0,
      "dx" : 0
   }
]

Not that there are two glyphs corresponding to the two Arabic characters, with their displacements (computed with the glyphAdvance value that is returned from the actual Harfbuzz api function). The values of dx and dy (which I think corresponds to x_offset and y_offset) are 0, so it does not pose a problem is most cases.

Noto Nastaliq

hb-shape --output-format=json  NotoNastaliqUrdu-Regular.ttf "یہ" | json_pp
[
   {
      "dy" : 0,
      "g" : "HehFin",
      "ax" : 472,
      "dx" : 0,
      "ay" : 0,
      "cl" : 1
   },
   {
      "dy" : -383,
      "g" : "TwoDotsBelowNS",
      "ax" : 0,
      "dx" : 310,
      "ay" : 0,
      "cl" : 0
   },
   {
      "cl" : 0,
      "ay" : 0,
      "dx" : 0,
      "ax" : 0,
      "g" : "sp2",
      "dy" : 0
   },
   {
      "dy" : -68,
      "g" : "BehxIni.outS1",
      "ax" : 731,
      "dx" : 0,
      "ay" : 0,
      "cl" : 0
   }
]

Note how the characters are broken out into 4 glyphs. The initial form of ی is three glyphs, as shown by the cl (cluster) value for the 3 glyphs – BehxIni.outS1, sp2 and TwoDotsBelowNS. They also have non-zero values for dx and dy. The ہ is formed by one glyph HehFin, and as a cl value of 1, signifying the next cluster.

I will post more once I have looked at your newer code.

deepakjois commented 8 years ago

I rebased on top of your code: https://github.com/deepakjois/luatex-harfbuzz-shaper/

I had to change your file to use luatextextdir and luatexpardir, but otherwise everything else was the same. However, my output seems to be different from yours.

screenshot 2015-11-29 19 34 55
davidcarlisle commented 8 years ago

I had to change your file to use luatextextdir and luatexpardir

you have an old version of latex? the luatex primitives are not prefixed in the 2015/10/01 release.

deepakjois commented 8 years ago
$> lualatex
This is LuaTeX, Version beta-0.80.1 (TeX Live 2015) (rev 5253)
 restricted \write18 enabled.
**

I did mess around with my luatex binary and recompiled it to include some symbols. Other than that it is just straight up TeXLive 2015.

davidcarlisle commented 8 years ago

It's a change in latex not the binary, I guess you have January rather than 205/01/01, rather than October 2015/10/01

David

deepakjois commented 8 years ago

Oops…Looks like I have an older version of LaTeX after all.

lualatex examples/newsample.tex
This is LuaTeX, Version beta-0.80.1 (TeX Live 2015) (rev 5253)
 restricted \write18 enabled.
(./examples/newsample.tex
LaTeX2e <2015/01/01>

I did not realise that TeXLive could be updated using tlmgr. I will try that later.

michal-h21 commented 8 years ago

OK, there is something really wrong going on. I've created simple script to compare results from shaping with current and older versions of justenoughharfbuzz, testhb.lua:

local hb = require "justenoughharfbuzz"

local hb2 = require "justenoughharfbuzz-old"

local function shape(text, hb, face, index)
  local script = "arab"
  local language = "URD"
  local size = 10
  local dir = "RTL"
  local features = ""
  if index~= nil then
    return {hb._shape(text, face, index, script, dir, language, size, features)}
  end
  return {hb._shape(text, face,  script, dir, language, size, features)}
end

local function showresult(result)
  print "================"
  for k,v in pairs(result) do
    local t = {}
    for x,y in pairs(v) do
      t[#t+1] = string.format("%s=%s",x,y)
    end
    print(table.concat(t, "; "))
  end
  print "----------------"
end

local noto = io.open("/home/michal/.fonts/NotoNastaliqUrdu-Regular.ttf", "r")
local data = noto:read("*all")
noto:close()

local text = "پراگ"

local result = shape(text, hb, data, 2)

print "Current Harfbuzz"
showresult(result)

local options = {font = "Noto Nastaliq Urdu", weight = 200,script = "arab", direction = "RTL", language = "URD", size = 10, features = "", variant = "normal"}

local face = hb2._face(options)

local result2 = shape(text, hb2, face.face)

print "Older version"
showresult(result2)

and the results are really strange:

Current Harfbuzz

name=; width=0; height=0; glyphAdvance=0; index=0; depth=-0; codepoint=20 name=; width=11.7529296875; height=0; glyphAdvance=11.7529296875; index=0; depth=-0; codepoint=888 name=; width=4.0625; height=0; glyphAdvance=4.0625; index=2; depth=-0; codepoint=868 name=; width=3.2958984375; height=0; glyphAdvance=3.2958984375; index=4; depth=-0; codepoint=858

name=; width=11.591796875; height=0; glyphAdvance=11.591796875; index=6; depth=-0; codepoint=902

Older version

name=Gaf; width=11.591796875; height=13.90625; depth=2.6611328125; codepoint=902 name=Alef; width=3.2958984375; height=11.0302734375; depth=0.5810546875; codepoint=858 name=Reh; width=4.0625; height=6.4697265625; depth=2.4853515625; codepoint=868 name=Behx; width=11.7529296875; height=6.0302734375; depth=2.55859375; codepoint=888

name=ThreeDotsDownBelowNS; width=0; height=0.703125; depth=4.12109375; codepoint=20

As you can see, not only x_shift is missing in the current output, but also some other variables have wrong values. This behavior has both my own compiled library and the one copied from SILE.

michal-h21 commented 8 years ago

I've found interesting thing while playing with the script from my previous post, the name, depth and height weren't set for any ttf font I tested, but for otf fonts, the values were correct. Maybe it is a bug in justenoughharfbuzz?

deepakjois commented 8 years ago

I don’t understand why there is a discrepancy in the old and new versions.

I will dig into the C code and see what is going on. There maybe a chance to simplify it, and just make a call to Harfbuzz’s hb_shape function and return the values directly in a Lua table. The rest of the calculations can be done in Lua itself. It may be easier to reason about the code then.

michal-h21 commented 8 years ago

It seems like a nice plan :) from my simple tests in the C code, it seems that this selects different features for otf and ttf files:

if (strncmp(font_s, "OTTO", 4) == 0) {
  hb_ft_font_set_funcs(hbFont);
} else {
  hb_ot_font_set_funcs(hbFont);
}

hb_ft_font_set_funcs is used for otf fonts and hb_font_get_glyph_extents then returns correct dimensions. x_offset and y_ofset doesn't seem to be set in both cases.

deepakjois commented 8 years ago

I really recommend trying out the harfbuzz-tutorial. It is simple to build and prints some debug output, along with a PNG which shows how Harfbuzz is rendering a text, given a font file name. It is a pretty clean and direct interface to Harfbuzz.

Here is the terminal output of from the hello-harfbuzz binary which shows the values being returned for the word یہ :

$> ./hello-harfbuzz ~/Downloads/NotoNastaliqUrdu-unhinted/NotoNastaliqUrdu-Regular.ttf   "یہ"
Raw buffer contents:
glyph='HehFin'  cluster=2       advance=(8.29688,0)     offset=(0,0)
glyph='TwoDotsBelowNS'  cluster=0       advance=(0,0)   offset=(5.4375,-6.71875)
glyph='sp2'     cluster=0       advance=(0,0)   offset=(0,0)
glyph='BehxIni.outS1'   cluster=0       advance=(12.8438,0)     offset=(0,-1.1875)
Converted to absolute positions:
glyph='HehFin'  cluster=2       position=(0,0)
glyph='TwoDotsBelowNS'  cluster=0       position=(13.7344,-6.71875)
glyph='sp2'     cluster=0       position=(8.29688,0)
glyph='BehxIni.outS1'   cluster=0       position=(8.29688,-1.1875)

$> ./hello-harfbuzz ~/Library/Fonts/amiri-regular.ttf   "یہ"
Raw buffer contents:
glyph='uni06C1.fina'    cluster=2       advance=(14.7031,0)     offset=(0,0)
glyph='uni06CC.init'    cluster=0       advance=(6.85938,0)     offset=(0,0)
Converted to absolute positions:
glyph='uni06C1.fina'    cluster=2       position=(0,0)
glyph='uni06CC.init'    cluster=0       position=(14.7031,0)

$>
deepakjois commented 8 years ago

It was getting a bit overwhelming for me, so I decided to start from scratch.

For now, I have a Lua binding to Harfbuzz that works for the basic case (accepts the text to be shaped, a font blob and font index), and returns the following values for each glyph recieved from Harfbuzz as-is:

Here is the code: https://github.com/deepakjois/luatex-harfbuzz-shaper (See the README for instructions to try it out)

I will slowly put back the other pieces, and support for more things like directions, script, features etc. I am planning to start with a very basic plain TeX file first, to make sure I understand the process fully from start to end. Meanwhile, if you wish to use the code, you can just copy over the Makefile and the C file. You will also need to do your own computations to convert the Harfbuzz metrics to something that LuaTeX can understand. The values being returned currently are scaled to the font’s upem value.

michal-h21 commented 8 years ago

Thanks, this seems nice. It reminds me that I totally forgot about SWIG project, which contains Lua bindings for Harfbuzz. I hadn't used it yet, because I didn't have functional font loader previously.

But now, when we don't use fontconfig library and if we want to use our own Harfbuzz bindings, instead of Sile's bindings, it may be worth to look at it again.

deepakjois commented 8 years ago

I did not know about SwigLib project, but I did try wrapping Harfbuzz with SWIG and failed miserably after trying for a couple of hours. I don’t know SWIG very well, so it is possible I was doing something wrong. I realised that we don’t really need to wrap that many APIs, so doing it manually was easier.

The bindings on the site seem to be for Harfbuzz 0.9.4, so it’s a bit old. Current Harfbuzz is at 1.1, I believe.

davidcarlisle commented 8 years ago

On 30 November 2015 at 18:13, Deepak Jois notifications@github.com wrote:

I did not know about SwigLib project, but I did try wrapping Harfbuzz with SWIG and failed miserably after trying for a couple of hours. I don’t know SWIG very well, so it is possible I was doing something wrong. I realised that we don’t really need to wrap that many APIs, so doing it manually was easier.

The bindings on the site seem to be for Harfbuzz 0.9.4, so it’s a bit old. Current Harfbuzz is at 1.1, I believe.

It would probably be good, in the end, to use the version of harfbuzz that's already in texlive, for use with xetex, which is 1.04 currently

https://www.tug.org/svn/texlive/trunk/Build/source/libs/harfbuzz/

David

michal-h21 commented 8 years ago

SWIGlib was pointed out while discussing Harfbuzz on LuaTeX mailing list this spring, it seems to be official way in which binary libraries might be distributed in the future, if I understand and recall correctly.

But anyway, I just tried to do some basic operations on this SWIG version and you are right, it probably isn't as easy as I thought, I run into type errors between C and Lua immediately.

So custom lib with basic interface might be probably better, or at least easier to use.

michal-h21 commented 8 years ago

Here is the code: https://github.com/deepakjois/luatex-harfbuzz-shaper (See the README for instructions to try it out)

I've modified the Makefile a bit, it compiles on Linux now. I can't make a pull request, so I will post it here.

Oh, Github doesn't support this filetype, so I have to paste it:

PKGS = harfbuzz

CFLAGS = `pkg-config --cflags $(PKGS)` `pkg-config --cflags lua`
LDFLAGS = `pkg-config --libs $(PKGS)`

UNAME_S := $(shell uname -s)
ifeq ($(UNAME_S),Linux)
    LIBFLAGS = -shared 
    STD = -std=gnu99
endif
ifeq ($(UNAME_S),Darwin)
    STD = 
    LIBFLAGS = -dynamiclib -undefined dynamic_lookup
endif

all: luaharfbuzz.so

luaharfbuzz.o: luaharfbuzz.c
    $(CC) -O2 -fpic $(CFLAGS) $(STD) -c luaharfbuzz.c

luaharfbuzz.so: luaharfbuzz.o
    $(CC) -O2 -fpic $(LDFLAGS) $(LIBFLAGS) -o luaharfbuzz.so luaharfbuzz.o

test: all
    lua harfbuzz_test.lua notonastaliq.ttf "یہ"
deepakjois commented 8 years ago

@michal-h21 I will update the Makefile. I believe you can make a pull request, you just have to make sure you are selecting the right source and destination branches in the Github Pull Request UI.

@davidcarlisle Thanks for the pointer. The Harfbuzz API is actually really stable, so it shouldn’t be too problematic to integrate with a slightly older version. But I agree that it would be good to target a TeX package that is included inside TeXLive and uses the Harfbuzz API inside the source tree. It is probably going to be a while though, and by then the Harfbuzz API may be updated again.

deepakjois commented 8 years ago

I have been thinking…it will be better if we approach the problem by decoupling the Harfbuzz bindings to Lua from the modules/packages required to use these bindings in LuaTeX.

I am thinking of focusing on making a good set of bindings for Harfbuzz in Lua as an independent project. This is also something that I can be most efficient on right now, because I don’t really know much about the details of how LuaTeX does node processing. I imagine that if there are good and nicely working bindings to Harfbuzz, people with more expertise on the LuaTeX side of things can better contribute to make it work with LuaTeX.

I have taken the C code I have cobbled together so far and put it in a new project called luaharfbuzz. It’s focus will be to provide a robust and functional set of Harfbuzz bindings for Lua. The plan in the end is to package it in different ways – using luarocks, as part of TeXLive etc.

With luaharfbuzz providing the Harfbuzz bindings, and luaotfload providing font handling and management, I hope it becomes easier for you (or anybody else) to work on incorporating Harfbuzz shaping into LuaTeX.

I am closing this issue. If you have any comments or questions, do open an issue inside luaharfbuzz, and we can discuss it there.