tc39 / proposal-intl-segmenter

Unicode text segmentation for ECMAScript
https://tc39.github.io/proposal-intl-segmenter/
146 stars 16 forks source link

Unicode Database and Related APIs #140

Open my2iu opened 3 years ago

my2iu commented 3 years ago

This is not directly related to segmentation, but it would also be useful if you added an API that provides access to the Unicode database. When a web app uses a custom text renderer to lay out text vertically or when implementing the BIDI algorithm, the code needs to look up character classes in the Unicode databases. This information is also needed when doing text shaping. It's annoying to package up a Unicode database to include in web pages, especially when the web browser already knows all that information.

Having an API for the BIDI algorithm might also be nice.

While I'm rambling, it would also be great if someone could hire an intern to figure out how to incrementally download CJK webfonts on-demand. Right now, CJK users are at a disadvantage to Western users in the use of fonts because CJK webfonts are so big that it makes for a slow download, so it would be nice if someone figured out whether it's possible to chop up these fonts into smaller chunks to make for a smaller, on-demand download.

sffc commented 3 years ago

Hi @my2iu, there is an upstream issue for Unicode properties: https://github.com/tc39/ecma402/issues/90. Please upvote and comment on that issue to help get it prioritized. Thanks!