Closed andreleoni closed 5 years ago
Hey Andre,
although REGEX_ANY
does match a lot of emoji-related codepoints, it does not match some Unicode-codepoints that are used by emoji, but are also used outside of the emoji-world, like U+200D zero-width joiner. That's exactly what is happening here, there is still a ZJW in the data:
uniscribe 'š¤šÆš®šāāš¬š¶š¼šøšŖššØāš¾šŗšš¤Æ'.gsub(Unicode::Emoji::REGEX_ANY, '')
200D āā ]ā[ āā ZERO WIDTH JOINER
I've clarified this behavior in the README table.
What you want to do is to use REGEX
which gives you better (and more robust) results. For example:
uniscribe 'š¤šÆš®šāāš¬š¶š¼šøšŖššØāš¾šŗšš¤Æ'.gsub(Unicode::Emoji::REGEX, '')
Unfortunately, this will let through textual emoji like
2195 āā ā āā UP DOWN ARROW`
To work around this issue, you can also remove emoji that respond to REGEX_TEXT
, for example, like this:
'š¤šÆš®šāāš¬š¶š¼šøšŖššØāš¾šŗšš¤Æ'.gsub(Regexp.union(Unicode::Emoji::REGEX, Unicode::Emoji::REGEX_TEXT), '') == "" # => true
Please leave some feedback, if this fixes your issue.
Actually, your feedback inspired me to have a REGEX_ALL regex in a future version of this gem, which will include textual emoji in its regex, see #5
Closing, please re-open if problem persists
Hello. Iām trying to use the gem to remove emojis from strings, but Iām getting an error when comparing the result with the expected string.
What Iām doing wrong here? :sweat_smile: