highlightjs / highlight.js

JavaScript syntax highlighter with language auto-detection and zero dependencies.
https://highlightjs.org/
BSD 3-Clause "New" or "Revised" License
23.31k stars 3.52k forks source link

(cpp) Keywords followed by parens are treated as `built_in` #3980

Open cjdb opened 5 months ago

cjdb commented 5 months ago

Describe the issue Keywords, such as static_assert, sizeof, etc. are typically followed by parentheses. static_assert; will be highlighted as if it's a keyword, but static_assert(true); is highlighted as a built_in.

image

Which language seems to have the issue?

<code class="hljs cpp">
<code class="hljs c++">

Are you using highlight or highlightAuto?

highlight, I think?

Sample Code to Reproduce

In reveal.js, with a freshly-downloaded hljs, add the following:

<pre><code class="hljs c++" data-trim data-noescape data-line-numbers style="overflow: visible;">
  static_assert; // highlighted as keyword
  static_assert(true); // highlighted as built_in
</code></pre>

Expected behavior

Keywords continue to use their keyword colour when followed by parens.

joshgoebel commented 4 months ago

We'd need a list of all false positives - keywords that are often followed by ( and then they can be added as excluded matches for that rule.

akhtarmdsaad commented 4 months ago

I think I got a list of all such keywords:

alignas
alignof
asm
catch
const_cast - used as - const_cast<int*>(p);
decltype  -  working fine
dynamic_cast - dynamic_cast <Dog*>(animal);
noexcept
reinterpret_cast - reinterpret_cast <new_type> (expression)
sizeof
static_assert
static_cast - static_cast<float>(10);
switch - working fine
typeid
while - working fine

You can check them here: Demo

It is seen that some of them are working fine while others are getting stuck. Looking for Suggestions !!!

akhtarmdsaad commented 4 months ago
const FUNCTION_DISPATCH = {
    className: 'function.dispatch',
    relevance: 0,
    keywords: {
      // Only for relevance, not highlighting.
      _hint: FUNCTION_HINTS },
    begin: regex.concat(
      /\b/,
      /(?!decltype)/,
      /(?!if)/,
      /(?!for)/,
      /(?!switch)/,
      /(?!while)/,
      hljs.IDENT_RE,
      regex.lookahead(/(<[^<>]+>|)\s*\(/))
  };

This is allowing them(switch, decltype, while etc) to work fine.

Should we add all such keywords here or are there any better solution for this

joshgoebel commented 4 months ago

Yep we need to add guards for all the false positives.

cjdb commented 4 months ago

Oops, I was planning to check for false positives, but have been prioritising a presentation. Thanks for taking the initiative, @akhtarmdsaad!

Your list looks good. The following are additional words that may need to be guarded against too.

// declarations
requires
explicit

// compound operators
bitand_eq
bitor_eq
xor_eq

// binary operators
and
or
bitand
bitor
xor
not_eq

// unary operators
not
compl
co_await
co_return
co_yield
cjdb commented 4 months ago

Here's confirmation. Looks like I missed auto(x) in my list above!

akhtarmdsaad commented 4 months ago

Thanks @cjdb for correcting my list. After checking the auto keyword in your list, I suddenly remembered that all the c++ datatypes like int char double float bool etc also requires to be added to the list. As int x = int(5); is also a valid statement in c++.

cjdb commented 4 months ago

Good catch! It's at this point that I'm beginning to wonder if it's possible to get highlight.js to tap into Clang tooling when that's made available. I'm expecting a "no", but am hoping wasm might be able to offer an in-road.

akhtarmdsaad commented 4 months ago

Same for the below keywords too: and_eq, case, delete, or_eq, return, throw

Working fine for return, throw due to this

const EXPRESSION_CONTEXT = {
    // This mode covers expression context where we can't expect a function
    // definition and shouldn't highlight anything that looks like one:
    // `return some()`, `else if()`, `(x*sum(1, 2))`
    variants: [
      {
        begin: /=/,
        end: /;/
      },
      {
        begin: /\(/,
        end: /\)/
      },
      {
        beginKeywords: 'new throw return else',
        end: /;/
      }
    ]
cjdb commented 4 months ago

Oh, we'll need to add new as well, if it's not there. What is the case for case?

akhtarmdsaad commented 4 months ago
switch (x){
  case (5):
       cout<<"x is five";
       break;
....
akhtarmdsaad commented 4 months ago

image Seems like its working fine

Here is the list of keywords that are added:

/(?!decltype)/,
      /(?!if)/,
      /(?!for)/,
      /(?!switch)/,
      /(?!while)/,
      /(?!alignas)/,
      /(?!alignof)/,
      /(?!asm)/,
      /(?!catch)/,
      /(?!const_cast)/,
      /(?!dynamic_cast)/,
      /(?!noexcept)/,
      /(?!reinterpret_cast)/,
      /(?!sizeof)/,
      /(?!static_assert)/,
      /(?!static_cast)/,
      /(?!typeid)/,
      /(?!requires)/,
      /(?!explicit)/,
      /(?!case)/,
      /(?!delete)/,

      // compound operators
      /(?!bitand_eq)/,
      /(?!bitor_eq)/,
      /(?!xor_eq)/,
      /(?!not_eq)/,
      /(?!or_eq)/,
      /(?!and_eq)/,

      // operators
      /(?!and)/,
      /(?!or)/,
      /(?!bitand)/,
      /(?!bitor)/,
      /(?!xor)/,

      // unary operators
      /(?!not)/,
      /(?!compl)/,
      /(?!co_await)/,
      /(?!co_return)/,
      /(?!co_yield)/,

      // Reserved types
      /(?!int)/, 
      /(?!char)/, 
      /(?!double)/, 
      /(?!float)/, 
      /(?!bool)/,
      /(?!auto)/,
akhtarmdsaad commented 3 months ago
const FUNCTION_DISPATCH = {
    className:"function.dispatch",
    keywords: CPP_KEYWORDS,  -----------------------> Added this line 
    relevance: 0,
    begin: regex.concat(
      /\b/,
      hljs.IDENT_RE,
      regex.lookahead(/(<[^<>]+>|)\s*\(/)
      ),
  };

The result looks like this:

image

Although the highlighting goes perfect but the keywords are placed inside built_in.