Closed deepakjois closed 8 years ago
I've created new fontloader, it is based on luaotfload, see
luaotfloadbranch. The usage is shown in [newsample.tex](https://github.com/michal-h21/luatex-harfbuzz-shaper/blob/luaotfload/examples/newsample.tex), This branch should be used with current version of
justenoughharfbuzz.so`.
Some older features doesn't work yet, namely feature setting in the document body, new font must be declared for every feature combination at the moment. On the other hand, hyphenation works now (except for ligatures), and fonts can be loaded from both the system and TeX tree.
Regarding x_offset
, it doesn't seem to be included in table returned by justenughharfbuzz
!s shaper (at least not fot Noto font), so we can't work with it unfortunately :( There are only glyphAdvance
and width
, which sometimes differ, but not with Arabic fonts I tested.
There is a strange thing about Noto Nastaliq Urdu
, as I can't get it to work correctly. It is shown in the output, but it appears similarly wrong with harfbuzz on and off, only some glyphs seems to be decomposed. Other fonts I tested (Amiri and Sheherazade) seems to work correctly. I have no idea what's wrong here, it probably needs some specific combination of script, language and features, harfbuzz is processing it wrongly otherwise, it seems.
I will check out your changes soon.
Are you sure about x_offset
and y_offset
not being returned?
Here it is being returned in an older version of justenoughharfbuzz.c
(the one on my fork of your code)
https://github.com/deepakjois/luatex-harfbuzz-shaper/blob/master/justenoughharfbuzz.c#L329
And here it is on the latest HEAD at SILE: https://github.com/simoncozens/sile/blob/master/src/justenoughharfbuzz.c#L186
In fact, processing those is crucial for Noto Nastaliq.
Moreover, Noto Nastaliq is shaped differently, because the Nastaliq script has a slanted baseline in practise. So the dots and the shapes are moved around based on the context.
You can understand it in detail if you use the hb-shape
tool that comes with Harfbuzz.
Here is an example of shaping a simple Urdu word یہ with Harfbuzz, using both Amiri and Noto Nastaliq:
hb-shape --output-format=json ~/Library/Fonts/amiri-regular.ttf "یہ" | json_pp
[
{
"cl" : 1,
"dx" : 0,
"ay" : 0,
"dy" : 0,
"g" : "uni06C1.fina",
"ax" : 836
},
{
"ax" : 390,
"g" : "uni06CC.init",
"cl" : 0,
"dy" : 0,
"ay" : 0,
"dx" : 0
}
]
Not that there are two glyphs corresponding to the two Arabic characters, with their displacements (computed with the glyphAdvance
value that is returned from the actual Harfbuzz api function). The values of dx
and dy
(which I think corresponds to x_offset
and y_offset
) are 0, so it does not pose a problem is most cases.
hb-shape --output-format=json NotoNastaliqUrdu-Regular.ttf "یہ" | json_pp
[
{
"dy" : 0,
"g" : "HehFin",
"ax" : 472,
"dx" : 0,
"ay" : 0,
"cl" : 1
},
{
"dy" : -383,
"g" : "TwoDotsBelowNS",
"ax" : 0,
"dx" : 310,
"ay" : 0,
"cl" : 0
},
{
"cl" : 0,
"ay" : 0,
"dx" : 0,
"ax" : 0,
"g" : "sp2",
"dy" : 0
},
{
"dy" : -68,
"g" : "BehxIni.outS1",
"ax" : 731,
"dx" : 0,
"ay" : 0,
"cl" : 0
}
]
Note how the characters are broken out into 4 glyphs. The initial form of ی is three glyphs, as shown by the cl
(cluster) value for the 3 glyphs – BehxIni.outS1
, sp2
and TwoDotsBelowNS
. They also have non-zero values for dx
and dy
. The ہ is formed by one glyph HehFin
, and as a cl
value of 1, signifying the next cluster.
I will post more once I have looked at your newer code.
I rebased on top of your code: https://github.com/deepakjois/luatex-harfbuzz-shaper/
I had to change your file to use luatextextdir
and luatexpardir
, but otherwise everything else was the same. However, my output seems to be different from yours.
I had to change your file to use luatextextdir and luatexpardir
you have an old version of latex? the luatex primitives are not prefixed in the 2015/10/01 release.
$> lualatex
This is LuaTeX, Version beta-0.80.1 (TeX Live 2015) (rev 5253)
restricted \write18 enabled.
**
I did mess around with my luatex binary and recompiled it to include some symbols. Other than that it is just straight up TeXLive 2015.
It's a change in latex not the binary, I guess you have January rather than 205/01/01, rather than October 2015/10/01
David
Oops…Looks like I have an older version of LaTeX after all.
lualatex examples/newsample.tex
This is LuaTeX, Version beta-0.80.1 (TeX Live 2015) (rev 5253)
restricted \write18 enabled.
(./examples/newsample.tex
LaTeX2e <2015/01/01>
I did not realise that TeXLive could be updated using tlmgr
. I will try that later.
OK, there is something really wrong going on. I've created simple script to compare results from shaping with current and older versions of justenoughharfbuzz, testhb.lua
:
local hb = require "justenoughharfbuzz"
local hb2 = require "justenoughharfbuzz-old"
local function shape(text, hb, face, index)
local script = "arab"
local language = "URD"
local size = 10
local dir = "RTL"
local features = ""
if index~= nil then
return {hb._shape(text, face, index, script, dir, language, size, features)}
end
return {hb._shape(text, face, script, dir, language, size, features)}
end
local function showresult(result)
print "================"
for k,v in pairs(result) do
local t = {}
for x,y in pairs(v) do
t[#t+1] = string.format("%s=%s",x,y)
end
print(table.concat(t, "; "))
end
print "----------------"
end
local noto = io.open("/home/michal/.fonts/NotoNastaliqUrdu-Regular.ttf", "r")
local data = noto:read("*all")
noto:close()
local text = "پراگ"
local result = shape(text, hb, data, 2)
print "Current Harfbuzz"
showresult(result)
local options = {font = "Noto Nastaliq Urdu", weight = 200,script = "arab", direction = "RTL", language = "URD", size = 10, features = "", variant = "normal"}
local face = hb2._face(options)
local result2 = shape(text, hb2, face.face)
print "Older version"
showresult(result2)
and the results are really strange:
name=; width=0; height=0; glyphAdvance=0; index=0; depth=-0; codepoint=20 name=; width=11.7529296875; height=0; glyphAdvance=11.7529296875; index=0; depth=-0; codepoint=888 name=; width=4.0625; height=0; glyphAdvance=4.0625; index=2; depth=-0; codepoint=868 name=; width=3.2958984375; height=0; glyphAdvance=3.2958984375; index=4; depth=-0; codepoint=858
name=Gaf; width=11.591796875; height=13.90625; depth=2.6611328125; codepoint=902 name=Alef; width=3.2958984375; height=11.0302734375; depth=0.5810546875; codepoint=858 name=Reh; width=4.0625; height=6.4697265625; depth=2.4853515625; codepoint=868 name=Behx; width=11.7529296875; height=6.0302734375; depth=2.55859375; codepoint=888
As you can see, not only x_shift
is missing in the current output, but also some other variables have wrong values. This behavior has both my own compiled library and the one copied from SILE.
I've found interesting thing while playing with the script from my previous post, the name
, depth
and height
weren't set for any ttf
font I tested, but for otf
fonts, the values were correct. Maybe it is a bug in justenoughharfbuzz
?
I don’t understand why there is a discrepancy in the old and new versions.
I will dig into the C code and see what is going on. There maybe a chance to simplify it, and just make a call to Harfbuzz’s hb_shape
function and return the values directly in a Lua table. The rest of the calculations can be done in Lua itself. It may be easier to reason about the code then.
It seems like a nice plan :) from my simple tests in the C code, it seems that this selects different features for otf
and ttf
files:
if (strncmp(font_s, "OTTO", 4) == 0) {
hb_ft_font_set_funcs(hbFont);
} else {
hb_ot_font_set_funcs(hbFont);
}
hb_ft_font_set_funcs
is used for otf
fonts and hb_font_get_glyph_extents
then returns correct dimensions. x_offset
and y_ofset
doesn't seem to be set in both cases.
I really recommend trying out the harfbuzz-tutorial. It is simple to build and prints some debug output, along with a PNG which shows how Harfbuzz is rendering a text, given a font file name. It is a pretty clean and direct interface to Harfbuzz.
Here is the terminal output of from the hello-harfbuzz
binary which shows the values being returned for the word یہ :
$> ./hello-harfbuzz ~/Downloads/NotoNastaliqUrdu-unhinted/NotoNastaliqUrdu-Regular.ttf "یہ"
Raw buffer contents:
glyph='HehFin' cluster=2 advance=(8.29688,0) offset=(0,0)
glyph='TwoDotsBelowNS' cluster=0 advance=(0,0) offset=(5.4375,-6.71875)
glyph='sp2' cluster=0 advance=(0,0) offset=(0,0)
glyph='BehxIni.outS1' cluster=0 advance=(12.8438,0) offset=(0,-1.1875)
Converted to absolute positions:
glyph='HehFin' cluster=2 position=(0,0)
glyph='TwoDotsBelowNS' cluster=0 position=(13.7344,-6.71875)
glyph='sp2' cluster=0 position=(8.29688,0)
glyph='BehxIni.outS1' cluster=0 position=(8.29688,-1.1875)
$> ./hello-harfbuzz ~/Library/Fonts/amiri-regular.ttf "یہ"
Raw buffer contents:
glyph='uni06C1.fina' cluster=2 advance=(14.7031,0) offset=(0,0)
glyph='uni06CC.init' cluster=0 advance=(6.85938,0) offset=(0,0)
Converted to absolute positions:
glyph='uni06C1.fina' cluster=2 position=(0,0)
glyph='uni06CC.init' cluster=0 position=(14.7031,0)
$>
It was getting a bit overwhelming for me, so I decided to start from scratch.
For now, I have a Lua binding to Harfbuzz that works for the basic case (accepts the text to be shaped, a font blob and font index), and returns the following values for each glyph recieved from Harfbuzz as-is:
gid
)ax
and ay
)dx
and dy
)w
, h
,xb
,yb
) Here is the code: https://github.com/deepakjois/luatex-harfbuzz-shaper (See the README for instructions to try it out)
I will slowly put back the other pieces, and support for more things like directions, script, features etc. I am planning to start with a very basic plain TeX file first, to make sure I understand the process fully from start to end. Meanwhile, if you wish to use the code, you can just copy over the Makefile and the C file. You will also need to do your own computations to convert the Harfbuzz metrics to something that LuaTeX can understand. The values being returned currently are scaled to the font’s upem value.
Thanks, this seems nice. It reminds me that I totally forgot about SWIG project, which contains Lua bindings for Harfbuzz. I hadn't used it yet, because I didn't have functional font loader previously.
But now, when we don't use fontconfig library and if we want to use our own Harfbuzz bindings, instead of Sile's bindings, it may be worth to look at it again.
I did not know about SwigLib project, but I did try wrapping Harfbuzz with SWIG and failed miserably after trying for a couple of hours. I don’t know SWIG very well, so it is possible I was doing something wrong. I realised that we don’t really need to wrap that many APIs, so doing it manually was easier.
The bindings on the site seem to be for Harfbuzz 0.9.4, so it’s a bit old. Current Harfbuzz is at 1.1, I believe.
On 30 November 2015 at 18:13, Deepak Jois notifications@github.com wrote:
I did not know about SwigLib project, but I did try wrapping Harfbuzz with SWIG and failed miserably after trying for a couple of hours. I don’t know SWIG very well, so it is possible I was doing something wrong. I realised that we don’t really need to wrap that many APIs, so doing it manually was easier.
The bindings on the site seem to be for Harfbuzz 0.9.4, so it’s a bit old. Current Harfbuzz is at 1.1, I believe.
—
It would probably be good, in the end, to use the version of harfbuzz that's already in texlive, for use with xetex, which is 1.04 currently
https://www.tug.org/svn/texlive/trunk/Build/source/libs/harfbuzz/
David
SWIGlib was pointed out while discussing Harfbuzz on LuaTeX mailing list this spring, it seems to be official way in which binary libraries might be distributed in the future, if I understand and recall correctly.
But anyway, I just tried to do some basic operations on this SWIG version and you are right, it probably isn't as easy as I thought, I run into type errors between C and Lua immediately.
So custom lib with basic interface might be probably better, or at least easier to use.
Here is the code: https://github.com/deepakjois/luatex-harfbuzz-shaper (See the README for instructions to try it out)
I've modified the Makefile a bit, it compiles on Linux now. I can't make a pull request, so I will post it here.
Oh, Github doesn't support this filetype, so I have to paste it:
PKGS = harfbuzz
CFLAGS = `pkg-config --cflags $(PKGS)` `pkg-config --cflags lua`
LDFLAGS = `pkg-config --libs $(PKGS)`
UNAME_S := $(shell uname -s)
ifeq ($(UNAME_S),Linux)
LIBFLAGS = -shared
STD = -std=gnu99
endif
ifeq ($(UNAME_S),Darwin)
STD =
LIBFLAGS = -dynamiclib -undefined dynamic_lookup
endif
all: luaharfbuzz.so
luaharfbuzz.o: luaharfbuzz.c
$(CC) -O2 -fpic $(CFLAGS) $(STD) -c luaharfbuzz.c
luaharfbuzz.so: luaharfbuzz.o
$(CC) -O2 -fpic $(LDFLAGS) $(LIBFLAGS) -o luaharfbuzz.so luaharfbuzz.o
test: all
lua harfbuzz_test.lua notonastaliq.ttf "یہ"
@michal-h21 I will update the Makefile. I believe you can make a pull request, you just have to make sure you are selecting the right source and destination branches in the Github Pull Request UI.
@davidcarlisle Thanks for the pointer. The Harfbuzz API is actually really stable, so it shouldn’t be too problematic to integrate with a slightly older version. But I agree that it would be good to target a TeX package that is included inside TeXLive and uses the Harfbuzz API inside the source tree. It is probably going to be a while though, and by then the Harfbuzz API may be updated again.
I have been thinking…it will be better if we approach the problem by decoupling the Harfbuzz bindings to Lua from the modules/packages required to use these bindings in LuaTeX.
I am thinking of focusing on making a good set of bindings for Harfbuzz in Lua as an independent project. This is also something that I can be most efficient on right now, because I don’t really know much about the details of how LuaTeX does node processing. I imagine that if there are good and nicely working bindings to Harfbuzz, people with more expertise on the LuaTeX side of things can better contribute to make it work with LuaTeX.
I have taken the C code I have cobbled together so far and put it in a new project called luaharfbuzz. It’s focus will be to provide a robust and functional set of Harfbuzz bindings for Lua. The plan in the end is to package it in different ways – using luarocks, as part of TeXLive etc.
With luaharfbuzz providing the Harfbuzz bindings, and luaotfload providing font handling and management, I hope it becomes easier for you (or anybody else) to work on incorporating Harfbuzz shaping into LuaTeX.
I am closing this issue. If you have any comments or questions, do open an issue inside luaharfbuzz, and we can discuss it there.
Harfbuzz shaper output has a couple of fields for every glyph in its output buffer –
x_offset
andy_offset
. This is currently not being taken into account in the luatex node processing callbacks. This probably does not matter for latin fonts (and some non-latin fonts like Amiri), i.e. it is 0. But it can produce incorrect output for many cases.I tested with typesetting Arabic with a Nastaliq font (Noto Nastaliq Urdu), and the output wasn’t even visible on the page.
Here is a simple hello-world Harfbuzz example that will clarify the concept of
x_offset
andy_offset
: https://github.com/behdad/harfbuzz-tutorial/blob/master/hello-harfbuzz.c