orling / grapheme-splitter

A JavaScript library that breaks strings into their individual user-perceived characters.
MIT License
926 stars 45 forks source link

splitter.countGraphemes('πŸ‘©β€πŸ¦°πŸ‘©β€πŸ‘©β€πŸ‘¦β€πŸ‘¦πŸ³οΈβ€πŸŒˆ') = 4 #31

Open stephen147 opened 3 years ago

stephen147 commented 3 years ago

Using emojis like πŸ‘©β€πŸ¦°πŸ‘©β€πŸ‘©β€πŸ‘¦β€πŸ‘¦πŸ³οΈβ€πŸŒˆ

var splitter = new GraphemeSplitter();
var graphemeCount = splitter.countGraphemes('πŸ‘©β€πŸ¦°πŸ‘©β€πŸ‘©β€πŸ‘¦β€πŸ‘¦πŸ³οΈβ€πŸŒˆ');
console.log(graphemeCount)

Result:

4

Lemmingh commented 2 years ago

24

U+1F9B0 was introduced in Unicode 11.0.

jamesjoung commented 2 years ago

any workaround?

SMhdAsadi commented 2 years ago

any workaround?

You can use this library instead which has support for new emoji versions.

Lemmingh commented 2 years ago

The ECMAScript Intl.Segmenter API has been finished and will be published this year.

Actually, all major browsers, except Firefox (https://bugzil.la/1423593), have supported it since earlier 2021.


If you only work on the latest engines, you can possibly switch to the ECMAScript API.

const segmenter = new Intl.Segmenter(void 0, { granularity: "grapheme" });

const input = String.fromCodePoint(
    0x1f469,
    0x200d,
    0x1f9b0,
    0x1f469,
    0x200d,
    0x1f469,
    0x200d,
    0x1f466,
    0x200d,
    0x1f466,
    0x1f3f3,
    0xfe0f,
    0x200d,
    0x1f308
);

const segments = segmenter.segment(input);

const graphemes = Array.from(segments, (s) => s.segment);

console.log(graphemes);
console.log(graphemes.length === 3); // true
Tairraos commented 1 year ago

try this, the split result is wrong πŸ€±πŸ‘©β€πŸΌπŸ§‘β€πŸΌπŸ’†πŸ’†β€β™‚οΈπŸ’†β€β™€οΈπŸ’‡πŸ’‡β€β™‚οΈπŸ’‡β€β™€οΈπŸšΆπŸšΆβ€β™‚οΈπŸšΆβ€β™€οΈπŸ§πŸ§β€β™‚οΈπŸ§β€β™€οΈπŸ§ŽπŸ§Žβ€β™‚οΈπŸ§Žβ€β™€οΈπŸ§‘β€πŸ¦―πŸ‘¨β€πŸ¦―πŸ‘©β€πŸ¦―πŸ§‘β€πŸ¦ΌπŸ‘¨β€πŸ¦ΌπŸ‘©β€πŸ¦ΌπŸ§‘β€πŸ¦½πŸ‘¨β€πŸ¦½πŸ‘©β€πŸ¦½πŸƒπŸƒβ€β™‚οΈπŸƒβ€β™€οΈπŸ’ƒπŸ•Ί

fantasticsoul commented 3 months ago

try this, the split result is wrong πŸ€±πŸ‘©β€πŸΌπŸ§‘β€πŸΌπŸ’†πŸ’†β€β™‚οΈπŸ’†β€β™€οΈπŸ’‡πŸ’‡β€β™‚οΈπŸ’‡β€β™€οΈπŸšΆπŸšΆβ€β™‚οΈπŸšΆβ€β™€οΈπŸ§πŸ§β€β™‚οΈπŸ§β€β™€οΈπŸ§ŽπŸ§Žβ€β™‚οΈπŸ§Žβ€β™€οΈπŸ§‘β€πŸ¦―πŸ‘¨β€πŸ¦―πŸ‘©β€πŸ¦―πŸ§‘β€πŸ¦ΌπŸ‘¨β€πŸ¦ΌπŸ‘©β€πŸ¦ΌπŸ§‘β€πŸ¦½πŸ‘¨β€πŸ¦½πŸ‘©β€πŸ¦½πŸƒπŸƒβ€β™‚οΈπŸƒβ€β™€οΈπŸ’ƒπŸ•Ί

try this code below, it works well

const list = 'πŸ€±πŸ‘©β€πŸΌπŸ§‘β€πŸΌπŸ’†πŸ’†β€β™‚οΈπŸ’†β€β™€οΈπŸ’‡πŸ’‡β€β™‚οΈπŸ’‡β€β™€οΈπŸšΆπŸšΆβ€β™‚οΈπŸšΆβ€β™€οΈπŸ§πŸ§β€β™‚οΈπŸ§β€β™€οΈπŸ§ŽπŸ§Žβ€β™‚οΈπŸ§Žβ€β™€οΈπŸ§‘β€πŸ¦―πŸ‘¨β€πŸ¦―πŸ‘©β€πŸ¦―πŸ§‘β€πŸ¦ΌπŸ‘¨β€πŸ¦ΌπŸ‘©β€πŸ¦ΌπŸ§‘β€πŸ¦½πŸ‘¨β€πŸ¦½πŸ‘©β€πŸ¦½πŸƒπŸƒβ€β™‚οΈπŸƒβ€β™€οΈπŸ’ƒπŸ•Ί'.match(/.[\u{fe0f}\u{1f3fb}-\u{1f3ff}]?(\u{200d}.[\u{fe0f}\u{1f3fb}-\u{1f3ff}]?)*/ug);
console.log(list);
console.log(list.length);
Tairraos commented 3 months ago

try this, the split result is wrong πŸ€±πŸ‘©β€πŸΌπŸ§‘β€πŸΌπŸ’†πŸ’†β€β™‚οΈπŸ’†β€β™€οΈπŸ’‡πŸ’‡β€β™‚οΈπŸ’‡β€β™€οΈπŸšΆπŸšΆβ€β™‚οΈπŸšΆβ€β™€οΈπŸ§πŸ§β€β™‚οΈπŸ§β€β™€οΈπŸ§ŽπŸ§Žβ€β™‚οΈπŸ§Žβ€β™€οΈπŸ§‘β€πŸ¦―πŸ‘¨β€πŸ¦―πŸ‘©β€πŸ¦―πŸ§‘β€πŸ¦ΌπŸ‘¨β€πŸ¦ΌπŸ‘©β€πŸ¦ΌπŸ§‘β€πŸ¦½πŸ‘¨β€πŸ¦½πŸ‘©β€πŸ¦½πŸƒπŸƒβ€β™‚οΈπŸƒβ€β™€οΈπŸ’ƒπŸ•Ί

try this code below, it works well

const list = 'πŸ€±πŸ‘©β€πŸΌπŸ§‘β€πŸΌπŸ’†πŸ’†β€β™‚οΈπŸ’†β€β™€οΈπŸ’‡πŸ’‡β€β™‚οΈπŸ’‡β€β™€οΈπŸšΆπŸšΆβ€β™‚οΈπŸšΆβ€β™€οΈπŸ§πŸ§β€β™‚οΈπŸ§β€β™€οΈπŸ§ŽπŸ§Žβ€β™‚οΈπŸ§Žβ€β™€οΈπŸ§‘β€πŸ¦―πŸ‘¨β€πŸ¦―πŸ‘©β€πŸ¦―πŸ§‘β€πŸ¦ΌπŸ‘¨β€πŸ¦ΌπŸ‘©β€πŸ¦ΌπŸ§‘β€πŸ¦½πŸ‘¨β€πŸ¦½πŸ‘©β€πŸ¦½πŸƒπŸƒβ€β™‚οΈπŸƒβ€β™€οΈπŸ’ƒπŸ•Ί'.match(/.[\u{fe0f}\u{1f3fb}-\u{1f3ff}]?(\u{200d}.[\u{fe0f}\u{1f3fb}-\u{1f3ff}]?)*/ug);
console.log(list);
console.log(list.length);

on mac, yes. on win, i don't know is it as well as now, but it wont work last year.