harfbuzz / rustybuzz

A complete harfbuzz's shaping algorithm port to Rust
MIT License
551 stars 37 forks source link

Port normalization. #11

Closed laurmaedje closed 4 years ago

laurmaedje commented 4 years ago

Okay, I ported the normalization. This one was already a bit trickier because it was more intertwined, especially with the compose/decompose function pointers through FFI.

Some observations:

I think, maybe the main shaping logic could be ported next, but I have to look more closely whether there are still any obstacles that need to be taken care of before.

RazrFalcon commented 4 years ago

Everything is fine. I will try to find time to push kerx support soon, so you would have a complete access to all implemented features. And it would work as a template for other TrueType tables related code.

Also, update Readme and Changelog.

RazrFalcon commented 4 years ago

@laurmaedje I've pushed the kerx branch. There are a lot of code and it even compiles. But it's not finished, tested or even "connected" to the harfbuzz. So I guess this is a good next step for you, since it still way easier than GSUB/GPOS.

As for the code/architecture: kern and kerx are implemented on the rb side (ignore ttf-parser kern implementation). Both parsing and processing. Eventually, we might move them to ttf-parser. But this is not a priority for now. As for testing, you have to test it on macos and find (somewhere) old Apple fonts. HB tests Apple fonts based on their hash, which is absurd, but there is no other legal way. That's why I've said in the readme that Apple shaping is an ultra-complex task. You basically have to write macos tests manually for RB based on macos.tests and then provide necessary files. No CI testing, obviously. Afaik, there is no way we can generate kerx, mort, morx tables ourselves, therefore we have to rely on existing Apple fonts. And obviously, we cannot simply ignore the Apple shaping, because it would break macos support. Moreover, there are some kerx subtables that are not used by any font, therefore there is no way we can tests it.

And then we have to port mort/morx, which are very similar to kerx. This would finish Apple shaping. Then GSUB/GPOS and we done. Just mere 2-3 months of work... Yay!

Feel free to ask any questions.

RazrFalcon commented 4 years ago

@laurmaedje To get a bit deeper into the Apple shaping testing, this is what we (someone) should do to make a proper implementation:

  1. Figure out how Apple's state machine generation is done. Because this is how kern, kerx, mort, morx are implemented. Afaik, there is no opensource implementation or even documentation. This task alone would take 2-3 months, or even more.
  2. Implement those tables in fonttools. This will also take 1-2 months.
  3. Write custom fonts from scratch. This would take few weeks.
  4. Replace harfbuzz's tests with the new one.
  5. Sync rustybuzz with harfbuzz.

As you can see, it could easily take half a year or even more. Which is an absurd amount of work. And basically the main reason why rustybuzz development become stale. I've already spend half a year on ttf-parser + rustybuzz and I simply could now work on it anymore. Turns out, that testing is times more complicated task than writing a TrueType parser and porting C++ to Rust.

Yes, we can simply port Apple table as is, without much testing, mainly because it's kinda an edge case and you have to try really hard to trigger it. Basically, you have to use specific Arabic fonts on macos. It would not affect any other OSes, really. But what's the point of RB then?

khaledhosny commented 4 years ago
1. Afaik, there is no opensource implementation or even documentation

They are documented but the documentation sometimes does not match the reality (not different from the MS tables), FontTools support reading and writing them as well and I’m pretty sure HarfBuzz has some custom test fonts but covering every bit and oddity in custom made fonts is tedious, that is why macOS fonts are also tested.

RazrFalcon commented 4 years ago

FontTools support reading and writing them

I haven't found kerx/morx code in fonttools. Am I missing something?

that is why macOS fonts are also tested

The current way of testing macOS fonts in harfbuzz drives me crazy. I understand that there is no easier way, but I don't want such monstrosity in rustybuzz.

khaledhosny commented 4 years ago

FontTools support reading and writing them

I haven't found kerx/morx code in fonttools. Am I missing something?

$ find -iname '*m_o_r_?*'
./Tests/ttLib/tables/_m_o_r_x_test.py
./Tests/ttLib/tables/_m_o_r_t_test.py
./Lib/fontTools/ttLib/tables/_m_o_r_x.py
./Lib/fontTools/ttLib/tables/_m_o_r_t.py

It does not support kerx though, I had assumed its support happened with mort/morx but it does not seem to be the case.

that is why macOS fonts are also tested

The current way of testing macOS fonts in harfbuzz drives me crazy. I understand that there is no easier way, but I don't want such monstrosity in rustybuzz.

$ find -iname '*morx*'
./test/api/fonts/aat-morx.ttf
./test/shaping/data/in-house/tests/aat-morx.tests
./test/shaping/data/in-house/fonts/MORXTwentyeight.ttf
./test/shaping/data/text-rendering-tests/tests/MORX-32.tests
./test/shaping/data/text-rendering-tests/tests/MORX-7.tests
...
RazrFalcon commented 4 years ago
  1. _m_o_r_t.py and _m_o_r_x.py are empty.
  2. I was talking about this and this.
khaledhosny commented 4 years ago
  1. _m_o_r_t.py and _m_o_r_x.py are empty.

This is normal for FontTools, the actual code is in the ot*.py files.

  1. I was talking about this and this.

I know, but these are not the only morx tests, there are many more tests using custom-made fonts.

RazrFalcon commented 4 years ago

the actual code is in the ot*.py files

Can you provide a link to it?

but these are not the only morx tests

I was talking about Apple fonts in general.

UPD: I was actually able to find some actions/state machine related code. I guess I've checked only kerx and assumed that morx also not supported. So the task become a bit simpler.

laurmaedje commented 4 years ago

With the Apple testing, I have two thoughts:

In general, I'll have to dig a bit deeper into the harfbuzz sources to find out how everything works together before even thinking about attempting any of this stuff that comes next.

RazrFalcon commented 4 years ago

@laurmaedje There is no point in testing using wiki, since the problem is that specific font tables are available only in proprietary fonts.

As for the optional AAT, yes, this is one possibility. But on the other hand, you have to jump straight into GSUB/GPOS, which are no less complex. I would say it would take me at least a month to implement those tables. Maybe more.

laurmaedje commented 4 years ago

Testing those proprietary fonts on macOS with a large corpus would at least make sure that rustybuzz matches harfbuzz's behaviour on those fonts. Even though these might not hit every edge case. With regards to the GSUB/GPOS complexity, that's true, although I still think that there's more to gain here short-term than with AAT.

Honestly, I have no idea what the right move is here. I love the idea of having all this in Rust, both for safety and ease of use in the Rust ecosystem, but it's obviously a massive effort and then there's also the fact that harfbuzz is actively developed and moving along in C++.

RazrFalcon commented 4 years ago

You're talking about fuzzing. We don't really need it for now. As long as rb passes all hb tests - I'm fine. I was taking about reproduce bale tests.

Yes, GSUB/GPOS are obviously more relevant and have a higher priority. Especially for cases when people bundle app fonts.

If you have a free time to work on this project, I would suggest trying GSUB/GPOS. Forget about AAT. We can move it to a separate C++ crate as an optional dependency.

ttf-parser/next has some initial GSUB/GPOS support, so you can start from there. But I strongly suggest implementing it on the rb side first. So there will be no need to jump between projects and designing an API. We can transfer it back to ttf-parser later.

harfbuzz is actively developed

Not really, actually. Most of the changes are subsetting related, which we don't have. Afaik, there were no radical changes in shaping since variable fonts (which were added two years ago or so). So don't worry about that.

I would say, that it possible to finish rb in 1-2 month without AAT. And then we can think about improving tests coverage and stuff.

RazrFalcon commented 4 years ago

By the way, I have a separate utility project called ttf-explorer, which can help you understand the binary structure more easily. The problem is that TrueType tables that it supports, are the same as supported by ttf-parser. So no GSUB/GPOS.

I'm actually adding a new table to ttf-explorer first and only then implementing it in ttf-parser. Maybe because this way you can actually see how the data is laid out.

laurmaedje commented 4 years ago

That's neat and looks very helpful! I'm gonna study harfbuzz's GGG-Table implementation and the OpenType spec a bit more and then see whether I have the time and ability to actually implement this stuff in Rust. I'd love to do it but it's difficult and I obviously can't say for sure that I'll get it done.

RazrFalcon commented 4 years ago

Sure. Not problem.