harfbuzz / harfbuzzjs

Providing HarfBuzz shaping library for client/server side JavaScript projects
https://harfbuzz.github.io/harfbuzzjs/
Other
187 stars 34 forks source link

A WebAssembly version of HarfBuzz #10

Closed photopea closed 5 years ago

photopea commented 5 years ago

Hi guys, I am developing a free web-based photo editor www.Photopea.com , which is used by around 100 000 people a day. It lets people do image editing, including inserting text into a picture.

As there is no sufficient OpenType parser and layout engine in Javascript, I made my own called Typr.js. It is quite advanced and can handle e.g. Arabic text. I also use this JS implementation of BIDI algorithm.

As more and more people use Photopea, I have to extend Typr.js . Currently, I am adding the support for Urdu and Khmer layout. I am often staring at OpenType specification for 5 - 10 hours, without writing a single line of code, only trying to understand what they mean. I would be more than happy to drop Typr.js and use an alternative, if there was any.

Would you be able to provide a WebAssembly version of your library to the public, while documenting it and maintaining it? I am ready to pay 5k - 10k USD for it. It is also important, that the library is not too large (e.g. 150-200kB zipped), as every person has to download it when starting Photopea.

ebraminio commented 5 years ago

Exactly what I am thinking about everyday! Well harfbuzz, despite doing a complicated thing itself, has a simple core API itself and the only thing matters about it is hb_shape(). Here is an attempt for it https://github.com/prezi/harfbuzz-js and also mine is here also https://github.com/harfbuzz/harfbuzz/pull/743 . You see even https://github.com/emscripten-ports/harfbuzz is empty as the support I've added to emscripten is using just our thing, the only thing remains however having a clean looking js library port, something I'm very interested to do but the trick is do it clean as possible so can merged upstream.

ebraminio commented 5 years ago

Assigning it to myself to see what happens, maybe we can have the wasm distribution in a separate repo in github.com/harfbuzz not if in harfbuzz repo itself.

ebraminio commented 5 years ago

So lets define some goal here, I think as I've put the support in Typr.js https://github.com/photopea/Typr.js/pull/28 already what we can do here eventually to have a cleaned up version of #1636 (just a html or js demo of how to use harfbuzz in browser or nodejs, without build results). We can then decide if we like to put harfbuzz in an npm package or provide .d.ts typescript definition documentation (automated generated) later, or, refer users to Typr.js anyway as a sample use.

photopea commented 5 years ago

I would be very happy, if we could make some progress in terms of WASM file size.

You are compiling it through a current version of Emscripten, right? The conversion is done through LLVM commands as intermediate state. Is it possible to convert C to WASM directly using other tools, that would provide smaller WASM?

I think my use case would probably the biggest use case of WASM version of HarfBuzz, as there will be hundreds of thousands of people downloading it as a part of the webpage every day :)

ebraminio commented 5 years ago

I went for building the library without emscripten before, even the fact that may work (but you should provide libc for the library somehow) emscripten itself incorporates good practices from what I can see.

We can reduce the current binary size by compiling harfbuzz without bulitin ucdn and Unicode function, 710kb -> 599kb (zipped, 214kb to 164kb) but that costs in correctness of shaping.

Other things may lead to some other reduction, disabling multithread, removing the not used APIs but considering binary size of HarfBuzz on Debian for example https://packages.debian.org/sid/libharfbuzz-bin (800kb which is compressed alo) I don't believe we can go for less than 150-100kb compressed :)

ebraminio commented 5 years ago

Applying all the mentioned things, it has become 479.9kb (compressed, 127.0kb) but I'd say 214kb is good also considering the correctness and completeness

behdad commented 5 years ago

Leaving out UCDN is a nonstarter.

As it happens I'm going to work on minimizing HarfBuzz for other uses. So I'll be working on this. Would be great to 1. have a streamlined way to build .wasm, and 2. a major user.

My current plans are: 1. better compressor for UCDN and other tables (based on packtab.c in fribidi repo), and 2. easy way to disable periphery API / legacy features (like Arabic fallback shaping).

ebraminio commented 5 years ago

Classic case of:

image

from https://hacks.mozilla.org/2018/01/making-webassembly-even-faster-firefoxs-new-streaming-and-tiering-compiler/

Leaving out UCDN is a nonstarter.

Yes, as said.

  1. have a streamlined way to build .wasm,

It is using cmake and emscripten #1636 and it is super easy to use and it doesn't make trouble for autotools development.

  1. a major user.

https://github.com/photopea/Typr.js/pull/28

Recently there was a huge hype around WASI also https://hacks.mozilla.org/2019/03/standardizing-wasi-a-webassembly-system-interface/

photopea commented 5 years ago

I would like to thank you for this amazing library, and I am informing you, that it is currently used at www.Photopea.com by thousands of users every day :)

As you open Photopea, 1.8 MB of data is downloaded (out of this, 250 kB is Harfbuzz, 90 kB are all 104 icons - 160x160px, 60 kB is a font database, 130 kB are localizations in 36 languages). Of course, everything is compressed during the transfer.

I wish you could make Harfbuzz smaller, but I don't understand it well enough to be able to give you any advice. I already started a discussion about making the Emscripten JS file smaller: https://github.com/emscripten-core/emscripten/issues/8409

behdad commented 5 years ago

Oooh, you did integrated it in Photopea already!? That's amazing!

I'm working on making a mini version of HarfBuzz in https://github.com/harfbuzz/harfbuzz/issues/1652

ebraminio commented 5 years ago

Reduced from 597222 to 558072 by removing CFF and to 523772 by removing AAT. My changes are on #1636 which only has this set of APIs '_hb_version_string', '_malloc', '_hb_blob_create', '_hb_face_create', '_hb_font_create', '_hb_buffer_create', '_hb_buffer_add_utf8', '_hb_buffer_guess_segment_properties', '_hb_buffer_set_direction', '_hb_shape', '_hb_buffer_serialize_glyphs', '_hb_buffer_get_length', '_hb_buffer_serialize_glyphs', '_hb_buffer_destroy', '_hb_font_destroy', '_hb_face_destroy', '_free' and compressed using closure and seems to work here! harfbuzzjs-closure-no-cff-aat2.zip, 170kb zipped, from 247kb

photopea commented 5 years ago

Yes, I did integrate it :)

I got very excited about minifying Photopea today. Maybe it could inspire you :D

There is a font database - a large JSON file, that Photopea loads every time. There are 4290 fonts. For each font, there are four strings: Family name, Subfamily name, Postscript name, Font URL. Also, a Font Category, and flags with supported scripts. This file was 451 kB and 57 kB ZIPped.

I made some hacks in my JSON representation (e.g. an empty PostScript name means, that the PostScript name is a concatenation of Family and Subfamily). I turned that JSON into 135 kB and 29 kB ZIPped - less than 7 bytes per font :D

behdad commented 5 years ago

Nice!

Want to show us your HarfBuzz integration glue? I'm afraid you also need a Unicode Bidirectional Algorithm implementation for full correctness.

photopea commented 5 years ago

I am using the Javascript implementation of BIDI algorithm, that I mentioned at the beginning. I added bidirectional support about two years ago. What glue do you mean?

behdad commented 5 years ago

Oh right. Sounds good.

The code calling into HarfBuzz I meant. Okay, so you probably just missing script-run itemization.

behdad commented 5 years ago

Ie. mixed-script text will currently be broken.

photopea commented 5 years ago

@ebraminio Could you give me an example of how to use your latest code? it seems like there is no _hb_blob_destroy .

@behdad What is a mixed-script? I call HarfBuzz separately on intervals of text, which share the same direction and font (in Photopea, each character can have a different font). I would like to encourage you to go to www.Photopea.com and try it out.

brawer commented 5 years ago

To render text, every browser already contains a shaping engine; if it was accessible from JavaScript, “download size” would be zero. At some point, there was talk about adding a text shaping API to the JavaScript core libraries, similar to the ICU wrapper in ECMA-402. Does anyone know what happened to that plan? Obviously it’d take a while to bring it through, but browsers have eventually adopted the Intl API. (As far as I can see, the main missing piece would be to find someone who can write a good API proposal for ECMA. That person would need to understand text rendering, have good JavaScript fu, and be patient enough to survive the standardization process.)

brawer commented 5 years ago

Re. mixed-script, there’s a proposal for adding an Intl.Segmenter to JavaScript. But currently, the proposal is only about breaking graphemes, words and sentences (exposing ICU break iterators), not script runs.

brawer commented 5 years ago

@photopea For correct rendering, you’ll need to do split the input text into script runs before calling HarfBuzz, but it’s more complicated. Perhaps you could follow the logic of Raqm; the script itemization code is in raqm_itemize.

ebraminio commented 5 years ago

it seems like there is no _hb_blob_destroy.

Ah, I've missed adding that call, here is the new version, harfbuzzjs-closure-no-cff-aat2.zip

photopea commented 5 years ago

@ebraminio great, thank you! I just updated and it is much smaller indeed :) BTW. is that "harfbuzzjs.js" a direct output from Emscripten, or you minifed it somehow? Do you think there is a space for minifying that JS even further? Could you write a comment on https://github.com/emscripten-core/emscripten/issues/8409 ?

ebraminio commented 5 years ago

Yes I used closure using the flag mentioned in that file actually, you can use Google Closure in rest of your project also and surprise yourself!

photopea commented 5 years ago

@ebraminio I did use Closure Compiler several years ago, but it turned out to be very slow, so I made my own, which does the same thing, but it is about 50x faster.

But they can make modifications on Emscripten side, that would make the Closure Compiler result even smaller.

ebraminio commented 5 years ago

But they can make modifications on Emscripten side

Interesting, I never thought of that!

behdad commented 5 years ago

@behdad What is a mixed-script?

Say, you have Hindi and English mixed in the same string.

I would like to encourage you to go to www.Photopea.com and try it out.

I did already. :)

photopea commented 5 years ago

@behdad I understand, that to use GPOS and GSUB tables, you need to know a script, which will lead you to a set of features and lookups, that should be applied to the text. In my library Typr.js, I used to loop through all features and apply all referenced lookups :D (each lookup at most once).

I use PSD format for storing files, which does not store any information about the script. Also, users are not used to enter a script when entering a text. So my goal is to display it like the web browser would display it (you also don't enter the script in HTML, you can only enter the base direction of the paragraph).

behdad commented 5 years ago

I use PSD format for storing files, which does not store any information about the script. Also, users are not used to enter a script when entering a text. So my goal is to display it like the web browser would display it (you also don't enter the script in HTML, you can only enter the base direction of the paragraph).

Exactly. Web browsers as well as any other complete text rendering system internally break the text down into "script runs" automatically and shape each one separately.

ebraminio commented 5 years ago

Some bug reporting magic will be useful for here I guess,

Steps to reproduce:

  1. Create a text holder in Photopea
  2. Set FreeSerif font for it
  3. Put "ދިسسی" on it

Actual: image

Expected: The thing you see on the browser

image

What happened? HarfBuzz has determined an incorrect script so rest of the thing went wrong.

Solution: Segmenting the text by scripts before passing it to HarfBuzz

photopea commented 5 years ago

I think that currently in practice, people usually use "one script" per layer in Photopea. So this will happen rarely.

I see, that there is Unicode Standard Annex #29. Do you know any implementation, that is not too big? I think the library which performs BIDI algorithm could perform this segmentation too, as they both probably need a database of details about Unicode characters (I think that is what you call UCDN).

behdad commented 5 years ago

No it's not the UAX 29 that you need. Script itemization is NOT speced at Unicode. However, here's the easy way to start: just break on any script change. hb_unicode_script. Then just special-case two values: HB_SCRIPT_INHERITED merges with previous character. HB_SCRIPT_COMMON also merges with neighboring. That's a very good start.

behdad commented 5 years ago

To render text, every browser already contains a shaping engine; if it was accessible from JavaScript, “download size” would be zero. At some point, there was talk about adding a text shaping API to the JavaScript core libraries, similar to the ICU wrapper in ECMA-402. Does anyone know what happened to that plan? Obviously it’d take a while to bring it through, but browsers have eventually adopted the Intl API. (As far as I can see, the main missing piece would be to find someone who can write a good API proposal for ECMA. That person would need to understand text rendering, have good JavaScript fu, and be patient enough to survive the standardization process.)

Here's one such attempt I did from 2012: http://goo.gl/jAeFRZ

photopea commented 5 years ago

I am not a big fan of making new web standards for things, that can be done with our own custom code. One may want a whole physics engine to be available through a browser API, or the whole content of the Boost C++ library.

I am very glad, that a modern web browser can be 30 MB big. I don't want it to get e.g. to a size of the .NET environment (4 GB). Not only because billions of people would have to download petabytes of extra data, but also because making a brand new browser would become extremely hard, as makers wouls have problems to keep up with the new standards.

I do understand, that all necessary code is already inside a browser. But making it accessible to JS and making it stupid-proof, catching all unexpected inputs, etc. would make the necessary code 2x larger. And it would only be useful to one in 10 000 websites.

behdad commented 5 years ago

Imagine if your operating system didn't provide any basic libraries either.. Guess you don't need to imagine, that's what Linux is. So yeah, if there was a way / repository for different websites to share the same library downloads that could be cached by browsers, then we wouldn't need the browser to provide such services.

But then again, being able to access system fonts is useful. Unless we also forgo all of that and "system" fonts becomes simply cached online fonts ala Google Fonts and ChromeOS.

photopea commented 5 years ago

Sharing / caching the same program by multiple websites is not that important in my opinion. E.g. HarfBuzz in WASM is just 150 kB (zipped). Today, people watch 4K videos on Youtube, so loading 150 kB every time you open a website is not a big deal.

Accessing system fonts from a browser will probably never happen. Mainly because people are too scared of fingerprinting (a list of fonts in your OS lets the website know that it is you, no matter if you use Incognito mode, VPN etc.).

In Photopea, I have 4300 fonts (OTF and TTF files) on the server. These are the only fonts available in Photopea. When a font is needed, it is downloaded over HTTP as a binary file. A font file is parsed and processed by Javascript, so there is no need for any system libraries or system fonts. People can load their own TTF / OTF files manually into Photopea as well.

I am able to provide the same experience on every OS and device, I am not dependent on system fonts, Google Fonts or ChromeOS.

ebraminio commented 5 years ago

Here is apparently a proposed Intl.Segmenter polyfill, https://gist.github.com/inexorabletash/8c4d869a584bcaa18514729332300356 not sure if it suffices the need.

ebraminio commented 5 years ago

Finally https://github.com/harfbuzz/harfbuzzjs/ And here is an example of it, https://github.com/harfbuzz/harfbuzzjs/blob/master/examples/hbjs.example.js lets continue the discussion till a npm release!

photopea commented 5 years ago

Hey, thanks for creating a separate repository! :)

Note, that I am a person, who does not use native programs during programming, such as C compilers or bash / command line, or npm. I only edit text files with a Notepad. I would be glad, if there was a way I could use Harfbuzz from that repository, but I dont see any JS or WASM file, that is ready to use.

ebraminio commented 5 years ago

They are available here harfbuzzjs.zip but will be soon available as a part of harfbuzzjs release on npm also as it is not that cool to put binaries in root of the project. I've also just created a readme file for the project https://github.com/harfbuzz/harfbuzzjs/ let me know if that is clear enough or anything needs to be added.

photopea commented 5 years ago

Hi, I would like to donate 1000 USD to the HarfBuzz project. It works excellently at Photopea.com and has saved me the struggle of implementing everything myself (even though I already invested hundreds of hours into Typr.js).

Is there a Donation page? I would like Ebrahim to get the part of it, too, as he helped me a lot. Do you work together in a group?

Also, I am still hoping you manage to make HarfBuzz smaller, either by throwing out unneeded parts, or improving the representation of structures.

twardoch commented 5 years ago

I'm not sure if there is a donation page, but I think the maintainers will suggest something. HarfBuzz has a long history — started as the FT_Layout submodule of FreeType with simple functionality, then a lot of work was done within Qt, and at some point, years ago, @behdad took over the development, first within RedHat, I think, and then within Google.

But it never was a “RedHat project” or a “Google project”. Behdad did a massive job and then was joined by others. Firefox, then Chrome, started using it to do the OpenType shaping, and the developers put an incredible amount of work in to make HB result-compatible with Uniscribe, Microsoft’s implementation of OpenType Layout (without having access to sources, so there was a lot of trial-and-error).

The project has now many man-years of developer work, and the developers have always shown the willingness to implement features (I mean wishes, new functionalities — not the OT features), of course as long as they remained within the scope of the lib. (I once asked for the hb-view tool and proposed its CLI spec, and Behdad did it in a week, which finally made it possible to produce simple text samples in all of the world scripts as PNG, SVG & PDF via Cairo).

There is a lot of implicit knowledge of Unicode & OpenType encoded in HarfBuzz, or — more broadly speaking — a lot of knowledge about the world typography. Thanks to HarfBuzz, both large and small languages have a chance for accurate and orthographically correct digital text exchange. Together with FreeType and the Google Noto project, HarfBuzz is an immense contribution to the centuries of the human written culture.

ebraminio commented 5 years ago

Just released the work on npm https://www.npmjs.com/package/harfbuzzjs the full version on https://wapm.io/package/ebraminio/harfbuzz (brand new wasm files package manager apparently) and whole thing, including .wasm files, .js interfaces and the lean wrapper, hbjs.js, on https://github.com/harfbuzz/harfbuzzjs/releases

Is there a Donation page?

not aware of any donation page, feel free to send the amount you like to Behdad, otherwise he should setup one.

photopea commented 5 years ago

How do I send money to Behdad? Does he have a bank account in the US or EU?

Thanks for creating all these npm / wasm package manager distribution channels. I do not use them, but I think many will use them. Personally, I would prefer if you invested the effort into Harfbuzz itself, to make it smaller / faster / more robust. Even though it is already quite small and fast for my needs :)

ebraminio commented 5 years ago

Personally, I would prefer if you invested the effort into Harfbuzz itself

It is already an squeeze from a 1.9Mb .wasm file (540kb zipped) to 536kb (159kb zipped, your original goal I think) using different techniques we've incorporated but there is of course room for more.

behdad commented 5 years ago

How do I send money to Behdad? Does he have a bank account in the US or EU?

I have US accounts, yes. You can email me@behdad.org. Thanks for your generous offer!

Thanks for creating all these npm / wasm package manager distribution channels. I do not use them, but I think many will use them. Personally, I would prefer if you invested the effort into Harfbuzz itself, to make it smaller / faster / more robust. Even though it is already quite small and fast for my needs :)

I'm still working on that in https://github.com/harfbuzz/harfbuzz/issues/1652

For example, I'm shrinking UCDN from over 100kb to about 30kb. My changes will make it to master soon.

photopea commented 5 years ago

May I have one more question? Does Harfbuzz support TTC files (Font Collections) ? A Font collection is basically several TTF files concatenated, with a list of offsets to each file at the beginning. They can also share some tables with each other (by sharing offsets to those tables).

When I load a whole TTC file to HarfBuzz, where do I specify, which font should be used for shaping?

ebraminio commented 5 years ago

Oh it does, you have to put the index you like instead 0 on module._hb_face_create(blob, 0);, there is a hb_face_count also but not available in your build, you can go without it but let me know if you want it.

behdad commented 5 years ago

That's the integer index passed to hb_face_create().

photopea commented 5 years ago

Wow, great, it works perfectly, thanks! :)

ebraminio commented 5 years ago

New build using Behdad's HB_TINY, only 440kb of .wasm harfbuzzjs.zip Not tested personally only to report but feel free to use if works there