harfbuzz / icu-le-hb

ICU Layout Engine API on top of HarfBuzz shaping library
Other
38 stars 23 forks source link

U+00AD should be visible hyphen #9

Closed elmarb closed 9 years ago

elmarb commented 9 years ago

ICU layout treats U+00AD "soft hypen" as a visible hyphen; icu-le-hb does not. I'm aware that U+00AD is ambiguously defined, but if icu-le-hb aims to be a compatible replacement of ICU layout, then it should reproduce the visible U+00AD behavior.

behdad commented 9 years ago

We expose ICU LayoutEngine API. We are NOT bug-compatible with it.

khaledhosny commented 9 years ago

Furthermore I don’t think there is any ambiguity around U+00AD (at least in the context of layout engines), it has general category of Cf so should have no visible output (unless in the show-format-characters mode).

elmarb commented 9 years ago

Furthermore I don’t think there is any ambiguity around U+00AD

Okay, to be verbose: The pre-Unicode soft hyphen 0xAD is ambiguous. It's understandable the Unicode consortium wouldn't standardize a maybe; they had to pick one or the other. But plenty code page conversions (e.g. in ICU) translate the maybe-visible 0xAD to U+00AD.

So there is a reasonable use case for having visible U+00AD as an option, even if it ain't pretty, and certainly shouldn't be the default in HarfBuzz.

Therefore the question isn't one of bug-compatibility, but whether icu-le-hb faithfully replicates ICU layout's idiosyncratic choice of default options. Since, as I understand it, icu-le-hb exists solely to offer a painless replacement to users of the deprecated ICU layout component, I'd still say it should.

In any case, the HarfBuzz site asks to "please report your experience" with icu-le-hb. Mine is: if I were to use it, I'd have to hack a workaround for this, to avoid regressions elsewhere.

(On a side note: Maybe the issue also occurs with other "ignorable" characters that happen to have a visible glyph in a given font. U+00AD just happens to be the one where I noticed it.)

behdad commented 9 years ago

Thanks. Can you elaborate a bit? Looks like what you might want is to replace 0xAD with U+002D during conversion...

At any rate, HarfBuzz itself has a PRESERVE_DEFAULT_IGNORABLES option. If you want to hook that up to icu-le-hb, I think I can live with that. But I wouldn't want to make it default. By default, I want clients using this shim library to get the enhanced HarfBuzz shaping. icu-le's shaping was broken in way too many ways to want to emulate.

Cheers