validatorjs / validator.js

String validation
MIT License
22.99k stars 2.28k forks source link

isLength fails with some emojis #1941

Closed cancerberoSgx closed 2 years ago

cancerberoSgx commented 2 years ago

isLength fail with some emojis.

Examples

const validator = require('validator')
console.log('Failing with emojis', validator.isLength('👩🦰👩👩👦👦🏳️🌈', {min: 1, max: 8}));
// false
console.log('OK without emojis', validator.isLength('12345678', {min: 1, max: 8}));
// true

Additional context Validator.js version: latest Node.js version: 14.19.0 OS platform: macOS

cancerberoSgx commented 2 years ago

BTW: tried some other implementations and the only one I found correct for this string is lodash.toArray - IMO using this same implementation should solve the issue:

const _ = require('lodash')

const s1 = '👩🦰👩👩👦👦🏳️🌈'
console.log('lodash toArray', _.toArray(s1).length) // 8
console.log('array from', Array.from(s1).length) // 9
console.log('array from', s1.match(/./gu).length) // 9
console.log('destructuring', [...s1].length) // 9

PD: underscore.toArray didn't work.

WikiRik commented 2 years ago

Do you also know why it fails?

cancerberoSgx commented 2 years ago

My "why" was in the lodash tip ;) Probably in the lines of : https://github.com/lodash/lodash/blob/master/.internal/unicodeToArray.js

other useful links https://github.com/lodash/lodash/blob/master/.internal/hasUnicode.js https://github.com/lodash/lodash/blob/master/toArray.js

Sorry don't have much time for a PR right now :(

BTW: I'm actually using express-validator library who relies on this - will end up using a custom validator in the meanwhile

nick-cd commented 2 years ago

I'm a beginner at Unicode. I just want to add in my thoughts to test my knowledge and help out :). In case I mention anything inaccurate, I apologize in advance

Do you also know why it fails?

It seems like 🏳️ is the offending character in that string. Note that it is not the same as 🏳, which would not have caused this issue.

The offending flag char consists of three distinct escape sequences. Specifically, it has a:

Thus, it is not simply an astral symbol but a grapheme cluster. The current implementation of isLength() only considers the astral plane code points (the surrogate halves). As a result, isLength() counts the stray non-spacing combination mark as an additional character, which poses this problem.

rubiin commented 2 years ago

https://github.com/validatorjs/validator.js/pull/1967 merged, watch out for next version