Open sffc opened 5 years ago
@litherum @FrankYFTang please see if 0ced4e2 addresses this. (I will make any further updates as a PR.)
More use cases were requested.
I for one have wanted the Name property of Unicode characters pretty much any time I need to parse any DSLs for output.
e.g.
Expected character in range: "0" (U+0030 DIGIT ZERO) to "9" (U+0039 DIGIT NINE) but got " " (U+0009 CHARACTER TABULATION).
There are several properties that are helpful in determining what language or script a given string is in when it's unknown, especially when combined with CLDR's script metadata (it would be nice to get an API to expose this data too, like ICU does through the uscript
API). This in turn opens up locale-specific processing (including existing APIs). If a string is mixed-script you can divide it up for different processing paths.
As something a little more concrete, say you want to generate readable, Unicode-supporting URL slugs. Intl.Segmenter
with granularity: 'word'
is a good basis for implementing this when it comes to languages like English which use separators (spaces) between words. But for languages/scripts where text is typically continuous (e.g. Chinese), word segmentation tends not to be particularly useful (in this context), aesthetic (e.g. a boundary every one or two characters for Chinese), or accurate (this is an inherently hard problem), so you might prefer sentence
segmentation or no segmentation instead. Given a list of such scripts (such as those identified by "LB letters" in the CLDR metadata), how do we determine if each character in an unknown string falls into that category or not? The current options are:
(those final two aren't necessarily mutually exclusive)
I think you're probably looking for something a little less verbose and more obvious for the readme, but I thought I'd try and contribute anyway, given that this proposal doesn't seem to be gaining any traction.
In #2, you added to the README,
I think time should be spent working out more concrete examples for the use cases.
@litherum @FrankYFTang