Closed yumetodo closed 2 months ago
@azu Thank you for your review! I applied your suggestions.
FYI: new Intl.Segmenter("ja-JP", { granularity: "grapheme" }) is more precise, but also more complex to implement due to language dependencies.
I just now noticed the API. When we pass undefined
as locale, it will cause unstable lint result. So, we need to decide what is to be specified and how to specify it.
However, I think it's out of this PR's scope. countBy?
can be extendable to some thing like countBy?: "codeunits" | "codepoints" | "grapheme";
.
However, I think it's out of this PR's scope.
countBy?
can be extendable to some thing likecountBy?: "codeunits" | "codepoints" | "grapheme";
.
Yes, I agree.
Abstruct
Unicode says that there are 4 ways to count string length. https://unicode.org/faq/char_combmark.html#7
This commit supports counting by Code points.
Motivation
When we write text something like Japanese, surrogate pair will be used as usual. In such context, restricting string length is painful without considering surrogate pair.