Open LaurenzV opened 4 weeks ago
Yeah HarfBuzz needs the 1:2 decomposition, which some libraries don't expose. It would be easier to add it to the unicode_norm
crate in my opinion.
My plan here is to just use icu4x which already has the low level composition functions (seemingly added in anticipation of supporting HarfBuzz :)
I think having an alternative to ICU would be nice, since that's a YUGE crate IIUC.
No disagreement from me. One thing I’ve considered is adding a build script that pulls in the icu4x crates and extracts the necessary properties into a compact data structure. This would be a nice option for a standalone shaper for users who are not already consuming the icu4x crates.
No disagreement from me. One thing I’ve considered is adding a build script that pulls in the icu4x crates and extracts the necessary properties into a compact data structure. This would be a nice option for a standalone shaper for users who are not already consuming the icu4x crates.
Or do what everyone else does and roll your own Python code to read the UCD data and spew out code. Given HB uses this:
https://github.com/harfbuzz/harfbuzz/blob/main/src/gen-ucd-table.py
and that mostly uses packTab to pack tables, and I've started adding Rust output to it:
https://github.com/harfbuzz/packtab/issues/5
looks like you might get a replacement for free.
We already have that, no? 😄 https://github.com/harfbuzz/harfruzz/blob/main/scripts/gen-unicode-norm-table.py
Althought this one is not using packTab yet.
My primary concern is that I’d like to avoid pulling in a bunch of arbitrary unicode-
crates.
I’m 100% on board with bundling our own UCD data and I don’t have strong feelings on whether this is generated with rust or python.
However, since Chrome (and the various Linebender projects) are planning on using icu4x for other things, it would be nice feature gate our bundled blobs and allow external implementations to avoid duplication. I suppose we just need HB style unicode funcs :)
I've attempted to do this in rustybuzz before, and the reason why I didn't end up pursuing this idea further is that, from what I gathered, the
unicode_norm
crate always decomposes a character as much as possible, while in harfbuzz (and currently in rustybuzz), we have a decomposition table that always decomposes it into exactly two components.Not sure if that makes any difference in the end, but since rustybuzz should stay as similar to harfbuzz as possible, I didn't actually try it. Maybe we can try it for harfruzz, though?