highlightjs / highlight.js

JavaScript syntax highlighter with language auto-detection and zero dependencies.
https://highlightjs.org/
BSD 3-Clause "New" or "Revised" License
23.31k stars 3.52k forks source link

(Fortran) highlight.js fails to identify literals #3968

Closed jbloino closed 2 months ago

jbloino commented 5 months ago

Describe the issue Fortran boolean/logical literals have the form .<keyword>. (.true., .false.). They are correctly listed in the internal database but not recognized in source codes.

Which language seems to have the issue? fortran (the language is set, not in auto-detection)

Are you using highlight or highlightAuto?

highlight (version 11.9.0)

Sample Code to Reproduce

program test
    logical :: a, b, c

    a = .true.
    b = .false.
    num = 3

    c = num .gt. 0 .and. .not. b

end program test

Expected behavior .true. and .false. should be recognized as literals (and highlighted with a different color with most themes). Github recognizes them correctly, see screenshot: image

Additional context The literals are defined as .True. and .False. but the language is correctly set as case-insensitive. Removing the leading and trailing dots in highlight.min.js correctly highlights true and false. However, the correct form of the literal is with the leading and trailing dots. It seems that the problem is in the regular expression parser.

As a side note, besides these literals, Fortran also supports other keywords delimited by dots, .and., .or., .not., as well as legacy keywords .lt. (lower than), .le. (lower or equal), .eq. (equal), .ne. (not equal), .ge. (greater or equal), .gt. (greater than). These keywords can now be replaced by symbols instead. While they are currently not supported, I expect that the same problem would arise if they were added to the list of keywords.

joshgoebel commented 5 months ago

We'd need to simply use the $pattern option to specify a regex to match the "shape" of the keywords, since the default of \w+ isn't going to work here as \w doesn't match ..

Perhaps:

\w+|\.\w+\.

Ie, a word or a word surrounded by .s...

jbloino commented 5 months ago

Thank you very much! Indeed, the Fortran block does not seem to have $pattern defined. Adding:

$pattern:/\b[a-z][a-z0-9_]+\b|\.[a-z][a-z0-9_]+\./,

inside the keywords block (l. 460 in highlight.min.js, l.10 in fortran.min.js) seems to do the trick. (cf. image below with a slightly modified version of night-owl.min.js as the color theme. The code is generated through asciidoctor-revealjs using highlight.js).

code_modified_highlight min js

Keywords in the Fortran standard use only latin characters and must start with a letter (except keywords or literals between dots). However, they can contain underscores (ex: len_trim), so \w would not be sufficient.

Actually, the logical and relational operators were indeed present (.and., .ge....), but were not parsed for the same reason as the literals.

It now works with this change and does not seem to break the examples I have for now.

joshgoebel commented 4 months ago

Willing to contribute a PR with the fix?

jbloino commented 4 months ago

Hi, sorry for the late reply. I have never done this before but will try. I have currently directly modified the generated files and will check how to do this properly. I may have also found some unrecognized Fortran intrinsic keywords and procedures in the list while using highlight.js for an introduction to the Fortran language.