highlightjs / highlight.js

JavaScript syntax highlighter with language auto-detection and zero dependencies.
https://highlightjs.org/
BSD 3-Clause "New" or "Revised" License
23.76k stars 3.61k forks source link

C++ : Trailing return types in function declaration result in illegal language match #3673

Open GuillaumeDua opened 1 year ago

GuillaumeDua commented 1 year ago

Describe the issue/behavior that seems buggy

hljs.highlightAuto does not work on the following example, while hljs.highlightElement does.

Sample Code or Instructions to Reproduce

hljs.highlightAuto("auto main() -> int {\n    auto i = 42; // test\n    return i;\n}", [ 'cpp' ]) 
// result :
{
  "language": "cpp",
  "value": "auto main() -> int {\n    auto i = 42; // test\n    return i;\n}",
  "illegal": true,
  "relevance": 0,
  "_illegalBy": {
    "message": "Illegal lexeme \"-\" for mode \"function\"",
    "index": 11,
    "context": "auto main() -> int {\n    auto i = 42; // test\n    return i;\n}",
    "mode": {},
    "resultSoFar": ""
  },
 ...
}

This issue seems to be related to the arrow -> (feature name : trailing return type), which is legal in post-modern C++.

See the documentation of function declaration, section (2) (C++11), here on cppreference :

Expected behavior

joshgoebel commented 1 year ago

Probably an easy fix but I worry it's likely to shake up our auto-detect tests...

GuillaumeDua commented 1 year ago

FYI for now the quick-fix on consumer (my) side is that when hljs.highlightAuto returns such result, then call hljs.highlightElement. Not smthg clean tho.

Post-modern C++ offers a large set of synthaxes that might be non-trivial to detect. However this one is quite important/relevant as this is basically the way to declare function with a post-modern style, and multiples features can come from it afterward (wont detail here).

joshgoebel commented 1 year ago

PR welcome... we're going to change the auto-detect stuff a lot in v12... and remove it from our test suite... it's just too flakey and not great in any case... so changes like this will be trivial then - if you make it now you might have to fight with auto-detect getting a bunch of things wrong that now incorrectly flag as C++. But anyone is welcome to make a PR and we'l see. :-)

GuillaumeDua commented 1 year ago

@joshgoebel Ofc sometimes it get wrong, but so far with simple experimentations I managed to make it work by reducing the languages subset. Collision between cpp and bash happens sometimes tho.

I understand detecting languages is a best-effort feature, and providing users the opportunity to choose which language a particular code section should be highlighted with is great enough.

Perhaps the simpliest quickfix here would be to force the language highlighting, so it cannot fail (like in the motivation example of this ticket above).

joshgoebel commented 1 year ago

You can already specific the language manually, no? The needed fix to the grammar (which we should pursue) is likely a one line one just to remove this particular illegal.

GuillaumeDua commented 1 year ago

Correct. 👍