photopea / Typr.js

Typr.js - process fonts in Javascript
MIT License
914 stars 73 forks source link

Add harfbuzz shaping support #28

Closed ebraminio closed 5 years ago

ebraminio commented 5 years ago

Hopefully you will be unblocked with this ;)

photopea commented 5 years ago

Hey, I am not familiar with harfbuzz so much to be able to accept this PR just like that. It also makes Typr.js 15x larger :D

I somehow fixed the Urdu support in my local version of Typr.js . Now I am trying to solve typing. The user has to be able to click into a rendered text, and the cursor will appear, and the user can type.

My current problem is, that String characters and glyphs are not 1-to-1. The number of glyphs can be smaller or even larger, than the number of characters, depending on substitutions. I need some kind of map between characters and glyphs, so that I can find the output glyph for each character, and the original character for each glyph. Does harfbuzz provide such map?

Is there any public manual for harfbuzz (either C or WASM version), so that I could see what it is capable of?

ebraminio commented 5 years ago

I need some kind of map between characters and glyphs

Cluster numbers are for that :)

photopea commented 5 years ago

BTW. what about the Bidi algorithm? Is it included in Harfbuzz? If I give it a string with both right-to-left and left-to-right runs, will it position glyphs correctly?

Also, I need to separate long paragraphs into multiple runs. I use my own code for it right now. I just want to know, how much of my code will I be able to replace :)

ebraminio commented 5 years ago

BTW. what about the Bidi algorithm? Is it included in Harfbuzz? If I give it a string with both right-to-left and left-to-right runs, will it position glyphs correctly?

It is not included as harfbuzz supposed to work on text with same direction and script so you have to separate your text by direction and script before feeding it into harfbuzz.

Cluster numbers are for that :)

They are provided with the json object the added function have, the "cl" field.

ebraminio commented 5 years ago

It also makes Typr.js 15x larger :D

Well, it is what you wanted! :) ~200kb if zipped

Is there any public manual for harfbuzz (either C or WASM version), so that I could see what it is capable of?

https://harfbuzz.github.io/

photopea commented 5 years ago

So I just found another bug in Khmer rendered by Typr.js : https://imgur.com/OcUt5sS (top is Google Chrome, bottom is Typr.js)

I will try to invest more time into learning about HarfBuzz, as I think it may save me a lot of time in the future.

ebraminio commented 5 years ago

I will try to invest more time into learning about HarfBuzz, as I think it may save me a lot of time in the future.

You can never replicate all the considerations and sometime hacks harfbuzz have to support all the scripts :)

photopea commented 5 years ago

Hey, so I tried it and it seems to be working quite well :) Do I understand these parts correctly?

I have some questions:

ebraminio commented 5 years ago

Correct.

Yes, always is ordered.

Both possible, isn't by number what I've put here already? I think I changed it to that afterward

On Fri, Mar 29, 2019, 5:02 PM Photopea <notifications@github.com wrote:

Hey, so I tried it and it seems to be working quite well :) Do I understand these parts correctly?

  • HarfBuzz can detect the writing direction from Unicode values - _hb_buffer_guess_segment_properties
  • in the output, glyphs are ordered left-to-right, no matter what the logical order was
  • the "cl" property denotes the cluster beginning. It is not the index of the initial character, but an index into the input byte array (so if "cl"s go like 0, 2, 4 ... there can be 2 characters per cluster, or 1 character per cluster, which uses two bytes in UTF8).

I have some questions:

  • in the output array, "cl" values are always in increasing, or in decreasing order (so that I could get a cluster size)?
  • is it possible to make HarfBuzz identify glyphs by the number - index (in "glyf","CFF","SVG") instead of a string (the "g" property in the output) ?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/photopea/Typr.js/pull/28#issuecomment-477981115, or mute the thread https://github.com/notifications/unsubscribe-auth/AAy3wde8vKbQKjHQmU_evTzQkQjORxl-ks5vbgfAgaJpZM4cRRnY .

ebraminio commented 5 years ago

Oh, just change last argument of _serialize from 0 to 4 /HB_BUFFER_SERIALIZE_FLAG_NO_GLYPH_NAMES/

On Fri, Mar 29, 2019, 5:54 PM Ebrahim Byagowi <ebraminio@gmail.com wrote:

Correct.

Yes, always is ordered.

Both possible, isn't by number what I've put here already? I think I changed it to that afterward

On Fri, Mar 29, 2019, 5:02 PM Photopea <notifications@github.com wrote:

Hey, so I tried it and it seems to be working quite well :) Do I understand these parts correctly?

  • HarfBuzz can detect the writing direction from Unicode values - _hb_buffer_guess_segment_properties
  • in the output, glyphs are ordered left-to-right, no matter what the logical order was
  • the "cl" property denotes the cluster beginning. It is not the index of the initial character, but an index into the input byte array (so if "cl"s go like 0, 2, 4 ... there can be 2 characters per cluster, or 1 character per cluster, which uses two bytes in UTF8).

I have some questions:

  • in the output array, "cl" values are always in increasing, or in decreasing order (so that I could get a cluster size)?
  • is it possible to make HarfBuzz identify glyphs by the number - index (in "glyf","CFF","SVG") instead of a string (the "g" property in the output) ?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/photopea/Typr.js/pull/28#issuecomment-477981115, or mute the thread https://github.com/notifications/unsubscribe-auth/AAy3wde8vKbQKjHQmU_evTzQkQjORxl-ks5vbgfAgaJpZM4cRRnY .

photopea commented 5 years ago

But when I am calling your function shapeStringToGlyphs for the second time with the same arguments, I am getting an error "OOM", which happens during var buffer = module._malloc(_font.byteLength);

photopea commented 5 years ago

I see you are allocating two "buffer"s , but freeing just one of them.

ebraminio commented 5 years ago

Oh fixed. Also take a look at https://github.com/harfbuzz/harfbuzz/blob/master/src/hb-ot-color.h you may find things there interesting also. Except glyf/CFF parsing, you can drop all of the font parsing you have locally using HarfBuzz.

ebraminio commented 5 years ago

It is important however hb_face_t instances be cached somehow as preparing it inside iharfbuzz is a bit costly.

ebraminio commented 5 years ago

hb_font_set_scale should be called for setting font size also sometime but it should be scaled by 16 to get subpixel information also

Yes, always is ordered.

Hmm. Maybe not in some corner cases, but most cases I am aware at least.

photopea commented 5 years ago

Also, note, that malloc() returns a pointer (integer), and you can not do

text_ptr[text.byteLength] = 0;

Instead, you should use

module.HEAPU8[text_ptr+text.byteLength] = 0;

I am still wondering, how zero-ended text is popular in low-level programming :D To me, it always made more sense to represent it with a pointer + length (and pass as two integers instead of one) :) You save one Byte and can have zeros inside a string.

photopea commented 5 years ago

BTW. I think it would be better to keep Typr.js and WASM HarfBuzz separated and independent.

Typr.js is a tiny library, that can handle 95% of use cases. Making it 10x bigger, just to cover the remaining 5% of use cases is not worth it.

I would like to change the interface of Typr.js to make it similar to HarfBuzz (add Typr.U.shape() function, that would return an array in the same format as HarfBuzz shape()). Programmers will be able to switch from Typr.js to HarfBuzz easily, as the interface will be identical, in case they need Urdu etc.

photopea commented 5 years ago

**** I do plan to use HarfBuzz heavily in Photopea, and I will probably stop using Typr.js in Photopea completely (or use just parts of it).

ebraminio commented 5 years ago

Oooh, fixed :)

The problem is there we can't commit to some JavaScript interface, but well, let's see :)

ebraminio commented 5 years ago

And the idea was to you have something to play :) Let's close this so

ebraminio commented 5 years ago

stop using Typr.js in Photopea completely

Sadly we don't support some of things Typr.js support already, a glyf/CFF glyph points interface, but that is in the roadmap.

photopea commented 5 years ago

What do you mean? All I need is the shape() function of HarfBuzz. Please, don't add any extra font processing, it would only make HarfBuzz bigger.

khaledhosny commented 5 years ago

Also, note, that malloc() returns a pointer (integer), and you can not do

text_ptr[text.byteLength] = 0;

Instead, you should use

module.HEAPU8[text_ptr+text.byteLength] = 0;

I am still wondering, how zero-ended text is popular in low-level programming :D To me, it always made more sense to represent it with a pointer + length (and pass as two integers instead of one) :) You save one Byte and can have zeros inside a string.

NULL-terminated strings is not a requirement here, the length of the string can be passed (and should be, in this case since the string is not already NULL-terminated) instead of -1 to hb_buffer_add_utf8().

ebraminio commented 5 years ago

What do you mean? All I need is the shape() function of HarfBuzz. Please, don't add any extra font processing, it would only make HarfBuzz bigger.

That is just in the plan for next year or so and may never happen and it is possible to reduce the size by subsetting the library by needed APIs.

instead of -1 to hb_buffer_add_utf8().

Correct, I'll apply that on harfbuzz PR.

photopea commented 5 years ago

@ebraminio BTW. how do I set the text direction in your WASM module? My Characters are typed in a following order:

... arabic1 ... 2018 .- 2019 .... arabic2 ....

And they need to be printed as

... arabic2 ... 2019 -. 2018 .... arabic1 ....

So the four-character sequence " .- " (between 2018 and 2019, which are in LTR) needs to be shaped in RTL order. I tried

module._hb_buffer_set_direction(hbuf, 2); // or 1

But it returns glyphs with ax=0.

ebraminio commented 5 years ago

You need to segment your input before feeding it into harfbuzz, setting direction won't be useful in that IIUC.

photopea commented 5 years ago

I did "segment" it. I need to call shape() only on a string " .- " and need the output in RTL.

Currently, module._hb_buffer_guess_segment_properties(); guesses the direction to be LTR.

ebraminio commented 5 years ago

Aha, yes that call should be removed but you need to detect and set the script also, hmm, I should look if is possible to have the guess but set the direction also without script, most of clients do both at the same time so may is not supported, I am not sure.

photopea commented 5 years ago

I solved it. Guess Properties only guesses the unset properties (if you set direction in advance, it will not change it).

module._hb_buffer_set_direction(hbuf, ltr?4:5);
module._hb_buffer_guess_segment_properties(hbuf);

But I had to find the constants 4 and 5 by trying :D

ebraminio commented 5 years ago

But I had to find the constants 4 and 5 by trying :D

Nice. We should have some way for that also, you can read the headers or use this https://36-24718300-gh.circle-artifacts.com/0/root/project/ABI.json "name" : "HB_DIRECTION_LTR", "value" : "4"; "name" : "HB_DIRECTION_RTL", "value" : "5" for now.

photopea commented 5 years ago

@ebraminio I have a strange problem with this WASM version of HarfBuzz.

I have two fonts: A - 700 kB and B - 10 MB. When I malloc() a space for A at the start and use it, everything works well. When I malloc() a space for B at the start, everything works well too.

But when I allocate A, use it, then free() it and try to malloc() space B, I get OOM error (out of memory probably). My single shaping looks like this:

    var buffer = module["_malloc"](fdata.byteLength);  // <=  throws OOM
    module["HEAPU8"].set(fdata, buffer);

    var blob = module["_hb_blob_create"](buffer, fdata.byteLength, 2, 0, 0);
    var face = module["_hb_face_create"](blob, 0);
    var font = module["_hb_font_create"](face);
    var hbuf = module["_hb_buffer_create"]();

    var text = utf8Encoder["encode"](str), tl = text.byteLength;
    var text_ptr = module["_malloc"](tl + 1);
    module["HEAPU8"].set(text, text_ptr); 
    module["HEAPU8"][text_ptr+tl]=0;
    module["_hb_buffer_add_utf8"](hbuf, text_ptr, -1, 0, -1);

    module["_hb_buffer_set_direction"](hbuf, ltr?4:5);
    module["_hb_buffer_guess_segment_properties"](hbuf);

    module["_hb_shape"](font, hbuf, 0, 0);

    var olen = str.length*100;
    var out = module["_malloc"](olen);
    module["_hb_buffer_serialize_glyphs"](hbuf, 0,
    module["_hb_buffer_get_length"](hbuf), out, olen, 0, font, 1246973774, 4);  //JSON 
    var r = module["HEAPU8"].slice(out, out+olen);
    var json = JSON.parse("[" + utf8Decoder["decode"](r.slice(0, r.indexOf(0))) + "]");

    module["_hb_blob_destroy"](blob);
    module["_hb_font_destroy"](font);
    module["_hb_face_destroy"](face);
    module["_hb_buffer_destroy"](hbuf);
    module["_free"](out);
    module["_free"](buffer);
    module["_free"](text_ptr);
ebraminio commented 5 years ago

Maybe something is not freed, I don't see it in your code but created with a wasm version with a larger heap anyway, harfbuzzjs.zip

photopea commented 5 years ago

Could you please take a closer look at my code? Maybe I am freeing something that should not be freed. Or can you test it yourself while using several large font files multiple times?

ebraminio commented 5 years ago

Hmm. I don't see the issue here using your code, I guess having more of your code will be more helpful.