harfbuzz / uharfbuzz

A HarfBuzz Python binding
Apache License 2.0
68 stars 25 forks source link

make use of info.cluster #197

Open replabrobin opened 4 months ago

replabrobin commented 4 months ago

I wanted to set the buffer info cluster value before shaping so I could use the returned cluster numbers as a guide to the input colours etc etc. I had to add a setter to make this possible

diff --git a/src/uharfbuzz/_harfbuzz.pyx b/src/uharfbuzz/_harfbuzz.pyx
index 5adf637..ead947e 100644
--- a/src/uharfbuzz/_harfbuzz.pyx
+++ b/src/uharfbuzz/_harfbuzz.pyx
@@ -69,6 +69,10 @@ cdef class GlyphInfo:
     def cluster(self) -> int:
         return self._hb_glyph_info.cluster

+    @cluster.setter
+    def cluster(self,v) -> None:
+        self._hb_glyph_info.cluster = v
+
     @property
     def flags(self) -> GlyphFlags:
         return GlyphFlags(self._hb_glyph_info.mask & HB_GLYPH_FLAG_DEFINED)

but although I can set the cluster values prior to shaping the returned clusters are all zero

so this code

#!/bin/env python
import uharfbuzz as hb

if False:
    import sys
    fontfile = sys.argv[1]
    text = sys.argv[2]
else:
    fontfile = '/home/robin/devel/reportlab/REPOS/reportlab/tmp/NotoSansKhmer/NotoSansKhmer-Regular.ttf'
    #1786 Khmer Letter Cha
    #17D2 Khmer Sign Coeng
    #1793 Khmer Letter No
    #17B6 Khmer Vowel Sign Aa
    #17C6 Khmer Sign Nikahit
    text = '\u1786\u17D2\u1793\u17B6\u17C6'

blob = hb.Blob.from_file_path(fontfile)
face = hb.Face(blob)
font = hb.Font(face)

buf = hb.Buffer()
buf.add_str(text)
infos = buf.glyph_infos
print(f'initial {len(infos)=}')
for i,info in enumerate(infos):
    info.cluster=i
buf.guess_segment_properties()
infos = buf.glyph_infos
print(f'guessed {len(infos)=} {[info.cluster for info in infos]}')

features = {"kern": True, "liga": True}
hb.shape(font, buf, features)

infos = buf.glyph_infos
positions = buf.glyph_positions

for info, pos in zip(infos, positions):
    gid = info.codepoint
    glyph_name = font.glyph_to_string(gid)
    cluster = info.cluster
    x_advance = pos.x_advance
    x_offset = pos.x_offset
    y_offset = pos.y_offset
    print(f"{glyph_name} gid{gid}={cluster}@{x_advance},{y_offset}+{x_advance}")

produces this output

$ tmp/tuharfbuzz 
initial len(infos)=5
guessed len(infos)=5 [0, 1, 2, 3, 4]
uni178617B6 gid248=0@923,0+923
uni17D21793 gid209=0@0,-26+0
uni17C6 gid137=0@0,-29+0

and all the returned clusters seem to be zero.

I find that if I use buf.cluster_level = 1 after creation then I do see a difference of clusters ie gid137 gets a cluster value 4

initial len(infos)=5
guessed len(infos)=5 [0, 1, 2, 3, 4]
uni178617B6 gid248=0@923,0+923
uni17D21793 gid209=0@0,-26+0
uni17C6 gid137=4@0,-29+0
justvanrossum commented 4 months ago

I don't think you are ever supposed to set the cluster manually. HarfBuzz does that for you, but there are three "levels" of operation, giving different results:

https://harfbuzz.github.io/working-with-harfbuzz-clusters.html

In the context of your example, you would set the level like this:

buf.cluster_level = hb.BufferClusterLevel.CHARACTERS
replabrobin commented 4 months ago

Thanks for that info. I don't need cluster.setter then. I really don't want to get into the horrid details of harfbuzz. The layout problems that result from using a shaper are enough. I suppose reportlab will need a new kind of font to allow input shaping and after line breaking the line drawing will need additional positioning. I doubt that we will end up with just one way to do it :(

behdad commented 4 months ago

Setting clusters on the buffer is sometimes useful. For example, in hb-view we reset them to be Unicode character index, instead of UTF-8 index.