highlightjs / highlight.js

JavaScript syntax highlighter with language auto-detection and zero dependencies.
https://highlightjs.org/
BSD 3-Clause "New" or "Revised" License
23.6k stars 3.58k forks source link

(rust) quotes around a char emoji are not colored properly #3933

Open tomsarry opened 10 months ago

tomsarry commented 10 months ago

Describe the issue In rust, single quotes around an emoji are not colored properly. However, autodetect (on csharp) successfully colors it.

Which language seems to have the issue? Rust

Are you using highlight or highlightAuto? highlight

Sample Code to Reproduce https://jsfiddle.net/cawyx173/

rust, coloring doesn't work image

autodetect (csharp), works image

Expected behavior When using rust highlighting, single quotes around an emoji should have the same color as single quotes around any other character.

Additional context Syntax highlighting works properly using double quotes around emojis. Problem seen in rust book.

joshgoebel commented 10 months ago
 {
        className: 'string',
        variants: [
          { begin: /b?r(#*)"(.|\n)*?"\1(?!#)/ },
          { begin: /b?'\\?(x\w{2}|u\w{4}|U\w{8}|.)'/ }
        ]
      },

I'm guessing . doesn't cover Emoji... I'd have to play around with this one...

tomsarry commented 10 months ago

Would using |\p{Extended_Pictographic} be acceptable? Quick checks seem to make it work.

https://regexr.com/7o2cc

tomsarry commented 10 months ago

I talked a bit too fast, the pattern above does not work either (I think :sparkles: is encoded using only one codepoint, the problem seems to be for characters encoded with 2 codepoints). After looking at the char implementation, the following are also valid but not matched by the expression above:

I really am not a regex expert, but I found the following matches for emojis / unicode characters, this might be of some help: