Closed Tiktera closed 1 year ago
You are not escaping the backslashes in your string provided to XRegExp, so this won't work at all. But if you do escape them, it seems to be working just fine.
XRegExp('#(\\p{L}|\\p{N})+(_(\\p{L}|\\p{N})+)*', 'gm').exec('#Barış_Manço');
// -> ['#Barış_Manço', ...]
Aside: Usually when people run into issues like this with \p{L}
, it is because there is some combining mark or zero-width spacing mark embedded in the word without them realizing. Accented characters are sometimes formed in Unicode strings by following a letter character with a combining mark character that is not itself a letter. So e.g. you might need to use [\p{L}\p{M}]
rather than just \p{L}
. To check exactly which code points are in your string you could use something like for (const c of '#Barış_Manço') console.log('U+' + Number(c.codePointAt(0)).toString(16).padStart(4, '0'));
Hello, I'm using this library to detect hashtag words in a string including #, , numbers, and all languages' letters. here is what I'm using: const regex = XRegExp('#(\p{L}|\p{N})+((\p{L}|\p{N})+)*', 'gm');
the problem is that it cannot match correctly when the string includes "#Barış_Manço" What it matches is "#Barış_Man" and it seems there is a problem with ç character. the same goes for â. I'm not sure if this is a bug, but it's not working correctly.
Thanks