Can we replace `unicode_norm.rs` with the unicode_norm crate?

harfbuzz / harfruzz

Port of RustyBuzz to use Fontations

MIT License

36 stars 3 forks source link

Can we replace `unicode_norm.rs` with the unicode_norm crate? #14

Open LaurenzV opened 4 weeks ago

LaurenzV commented 4 weeks ago

I've attempted to do this in rustybuzz before, and the reason why I didn't end up pursuing this idea further is that, from what I gathered, the unicode_norm crate always decomposes a character as much as possible, while in harfbuzz (and currently in rustybuzz), we have a decomposition table that always decomposes it into exactly two components.

Not sure if that makes any difference in the end, but since rustybuzz should stay as similar to harfbuzz as possible, I didn't actually try it. Maybe we can try it for harfruzz, though?

behdad commented 4 weeks ago

Yeah HarfBuzz needs the 1:2 decomposition, which some libraries don't expose. It would be easier to add it to the unicode_norm crate in my opinion.

dfrg commented 1 day ago

My plan here is to just use icu4x which already has the low level composition functions (seemingly added in anticipation of supporting HarfBuzz :)

behdad commented 1 day ago

I think having an alternative to ICU would be nice, since that's a YUGE crate IIUC.

dfrg commented 1 day ago

No disagreement from me. One thing I’ve considered is adding a build script that pulls in the icu4x crates and extracts the necessary properties into a compact data structure. This would be a nice option for a standalone shaper for users who are not already consuming the icu4x crates.

behdad commented 1 day ago

No disagreement from me. One thing I’ve considered is adding a build script that pulls in the icu4x crates and extracts the necessary properties into a compact data structure. This would be a nice option for a standalone shaper for users who are not already consuming the icu4x crates.

Or do what everyone else does and roll your own Python code to read the UCD data and spew out code. Given HB uses this:

https://github.com/harfbuzz/harfbuzz/blob/main/src/gen-ucd-table.py

and that mostly uses packTab to pack tables, and I've started adding Rust output to it:

https://github.com/harfbuzz/packtab/issues/5

looks like you might get a replacement for free.

LaurenzV commented 23 hours ago

We already have that, no? 😄 https://github.com/harfbuzz/harfruzz/blob/main/scripts/gen-unicode-norm-table.py

Althought this one is not using packTab yet.

dfrg commented 23 hours ago

My primary concern is that I’d like to avoid pulling in a bunch of arbitrary unicode- crates.

I’m 100% on board with bundling our own UCD data and I don’t have strong feelings on whether this is generated with rust or python.

However, since Chrome (and the various Linebender projects) are planning on using icu4x for other things, it would be nice feature gate our bundled blobs and allow external implementations to avoid duplication. I suppose we just need HB style unicode funcs :)