opentypejs / opentype.js

Read and write OpenType fonts using JavaScript.
https://opentype.js.org/
MIT License
4.36k stars 466 forks source link

Adopt a better parsing/encoding strategy in OpenType.js #152

Open felipesanches opened 8 years ago

felipesanches commented 8 years ago

Based on the discussion seen below we currently have 2 choices:

Manually parsing the complex datastructures of the OpenType file format is very time-consuming and error prone. And also results in verbose-code with lots of code duplication and which is tipically hard to read, understand and debug.

The approach presented by @Pomax in his "A binary parser generator" project available at https://github.com/Pomax/A-binary-parser-generator seems very good because it relies on storing the overall file format structure in a "spec" file which is then used to automactically validate and parse OTF files (the project allows any file format to be specified, but OTF is what we're interested here, right :-D).

Several years ago I did something similar for handling the DWG file format in the GNU LibreDWG project (https://www.gnu.org/software/libredwg/). And based on our spec file we were able to generate both the format parser and the format encoder. I am not sure Pomax's implementation provides encoding as well, which would be strictly necessary for us here.

PS: As I see that Pomax has already contributed code to OpenType.js, I wonder if the proposal of incorporating the binary-parser-generator was already presented here in the past.

devongovett commented 8 years ago

I've been using my restructure project for this. It supports both decoding and encoding all sorts of data types, and as a result of being developed for fontkit, it works pretty well for the OpenType format. 😜

felipesanches commented 8 years ago

I like the fact you seem to provide much more documentation than we get on Pomax's project.

I am also glad to see the description of the LazyArray, which is something I was expecting to see implemented in whatever solution we choose for performance reasons.

Also, it is nice to see you're working with require.js in a way that makes your solution more likely to be easily reusable as a dependency of other projects.

felipesanches commented 8 years ago

This looks good: https://github.com/devongovett/fontkit/blob/master/src/opentype/GPOSProcessor.coffee

But it seems not all opentype tables are supported yet? Also, the code seems slightly harder to read than Pomax's spec file syntax. I'll have to get used to that if we choose to go on with adopting your project here. I'm not totally convinced yet :-)

devongovett commented 8 years ago

The tables are here

felipesanches commented 8 years ago

Pomax's project provides a test page with a binary-file element inspector which I liked a lot. Looks very good for testing & debugging.

pomax_otf_inspector

miguelsousa commented 8 years ago

The tables are here

I don't see a CFF.coffee file so I'm guessing that OpenType-CFF fonts are not supported.

devongovett commented 8 years ago

They are, just in a different folder: https://github.com/devongovett/fontkit/tree/master/src/cff

miguelsousa commented 8 years ago

Nice!

Pomax commented 8 years ago

To be fair: my project was a PoC that I never got enough time for to properly redo (there's a lot of ES6/ES2015 that can be sprinkled in to drastically increase the efficacy of a spec based approach), so the lack of docs is really because it's a lack of "this is done in any way, shape, or form" =)

(iirc there wasn't even all that much in the way of the Common Layout tables like GSUB/GPOS)

The downside of my project was also that the parser it generated would load everything as a structured object, which is absolutely dreadful when you need efficient font traversal, where you just want a memory map in which you follow offset pointers, accumulating only exactly as much data as is necessary to shape the requested string. You can cache some things on the way, but ideally, that cache is cleared after the shaping run to give the system as much free memory as possible.

felipesanches commented 8 years ago

@Pomax have you already looked at @devongovett 's work on Restructure ? Would you vouch for it being used on OpenType.js instead of your binary-parser-generator ?

davelab6 commented 8 years ago

I also wonder what @fdb and @brawer think :)

Pomax commented 8 years ago

I hadn't seen restructure yet, it's now on my "to have a look at" list although I probably won't get to that until later in the week

behdad commented 8 years ago

I have been thinking about this for about a year now. Really like Restruscture and was going to talk to Devon about it when I get back to California.

In my vision, there should be one spec of font formats that is maintained centrally, and used to generate data / code for each language (C, Python, JS). But in the end this is so much work for no added benefit, so I won't personally be working on it. But really like that others are interested in it and like to be involved in the design.

behdad commented 8 years ago

Note that compiling GSUB /GPOS needs table sharing logic. Devon, do you implement that?

devongovett commented 8 years ago

I haven't tried compiling the GSUB/GPOS tables yet, but they should work since they are defined using types from restructure. As for table sharing logic, restructure doesn't currently have a way for two different pointers to point to the same place.

felipesanches commented 8 years ago

Basically we would need to calculate a checksum for each subtable and then optimize the serialization when checksums are equal. Or maybe some other procedure equivalent to that.

felipesanches commented 8 years ago

I guess that encoding the tables without this optimization would still produce valid OTF files. It's just that their size would be larger, which is undesirable. But I don't see programs being unable to handle reading the unoptimized tables. So it seems to be an issue of size optimization, instead of a matter of compliance to the file format spec.

felipesanches commented 8 years ago

OK. I just registered an issue about that:

https://github.com/devongovett/restructure/issues/20

felipesanches commented 8 years ago

Another good thing about restructure is that it's got lots of mocha tests (a total of 234).

felipesanches commented 8 years ago

So... after spending the afternoon inspecting the source code of these projects I came to the conclusion that it is not really restructure that we would use here, but instead we would actually be using fontkit, which has restructure as a dependency.

But then, fontkit's description is much like the opentype.js project description... Are these two projects actually trying to solve the same thing? How does fontkit differ from opentype.js ?

Pomax commented 8 years ago

that is certainly an important question to answer.

fdb commented 8 years ago

Initially, when developing OpenType.js, I also had table specs (check out the first commit for example). Even though I'm a fan of the declarative approach, I soon discovered that a purely declarative approach is problematic. For some tables, e.g. head, the declarative approach works really well. However, things like CFF are too difficult to encode declaratively and do actually need logic for parsing. That's why I think it's difficult, if not impossible, to have a "universal spec" as @behdad suggested. I see that Fontkit solved some issues quite nicely, e.g. using a VersionedStruct to encode both long and short loca offsets, and I must say I like the approach.

OpenType.js and Fontkit were developed independently. I developed OpenType.js to "scratch my own itch": for NodeBox Live we needed access to the glyph shapes of a font, and nothing was available. Later on, through the efforts of @Pomax (and @louisremi), we also added font exporting.

The two projects do have a lot in common and I'm playing with the idea of merging efforts. It seems Fontkit has more support for things like WOFF, which we currently don't support.

I'm a bit torn at the moment. I like the things that Fontkit is doing, but I also think it would be silly to give up the effort that went into creating OpenType.js.

moyogo commented 8 years ago

@davelab6 had listed a few projects trying to solve the same thing in byte-foundry/prototypo#115 @bramstein's opentype is still actively being developed. I’d also add fonteditor-core by @kekee000 used in fonteditor.

Pomax commented 8 years ago

From a personal perspective, it's also nice to have options as a user. Right now, I can use harfbuzz or freetype2 or roll something quick and dirty using a TTX export of a font, and having that same freedom in JS land is great: I can use OpenType.js or Fontkit, and they have their respective strengths and weaknesses (format and specific shaping support, memory footprints, etc). So I'm tempted to pitch my 2 cents as "even if they do the same, the way they do things are different enough to be valueable as checks on each other". One library to rule them all always sounds tempting, but multiple libraries to divide and conquer based on the tool you need, to me, has stronger appeal.

devongovett commented 8 years ago

Some history about fontkit for those who don't know: it was originally written for PDFKit, my PDF generation library, over 4 years ago, and existed as a part of that project for a while. Then, I kept getting a ton of bug reports about font issues, much of which were related to complex script support. So, a couple years ago I started writing fontkit as it exists today, and I extracted it as a separate project so it could be useful outside of just PDFKit. It is finally getting to state where I can ship it as the default font engine in PDFKit, and I plan to do so shortly.

Fontkit and opentype.js do have slightly different goals as far as I can tell. Fontkit was originally developed to solve the layout problem, along with PDF subsetting, and although it has grown to support other things like glyph decoding, layout remains its primary goal in my opinion.

On the other hand, from what I can tell, opentype.js is mostly focused on encoding and decoding glyph paths, rather than glyph layout. Fontkit currently does not support glyph encoding from paths like opentype.js does, just decoding. I may need to do something in this area in the future though, in order to support variation fonts in PDFs...