Closed cancerberoSgx closed 2 years ago
BTW: tried some other implementations and the only one I found correct for this string is lodash.toArray - IMO using this same implementation should solve the issue:
const _ = require('lodash')
const s1 = '👩🦰👩👩👦👦🏳️🌈'
console.log('lodash toArray', _.toArray(s1).length) // 8
console.log('array from', Array.from(s1).length) // 9
console.log('array from', s1.match(/./gu).length) // 9
console.log('destructuring', [...s1].length) // 9
PD: underscore.toArray didn't work.
Do you also know why it fails?
My "why" was in the lodash tip ;) Probably in the lines of : https://github.com/lodash/lodash/blob/master/.internal/unicodeToArray.js
other useful links https://github.com/lodash/lodash/blob/master/.internal/hasUnicode.js https://github.com/lodash/lodash/blob/master/toArray.js
Sorry don't have much time for a PR right now :(
BTW: I'm actually using express-validator library who relies on this - will end up using a custom validator in the meanwhile
I'm a beginner at Unicode. I just want to add in my thoughts to test my knowledge and help out :). In case I mention anything inaccurate, I apologize in advance
Do you also know why it fails?
It seems like 🏳️ is the offending character in that string. Note that it is not the same as 🏳, which would not have caused this issue.
The offending flag char consists of three distinct escape sequences. Specifically, it has a:
Thus, it is not simply an astral symbol but a grapheme cluster. The current implementation of isLength()
only considers the astral plane code points (the surrogate halves). As a result, isLength()
counts the stray non-spacing combination mark as an additional character, which poses this problem.
https://github.com/validatorjs/validator.js/pull/1967 merged, watch out for next version
isLength fail with some emojis.
Examples
Additional context Validator.js version: latest Node.js version: 14.19.0 OS platform: macOS