rouge-ruby / rouge

A pure Ruby code highlighter that is compatible with Pygments
https://rouge.jneen.net/
Other
3.32k stars 733 forks source link

html lexer javascript comment highlight error #1752

Open nocnob opened 2 years ago

nocnob commented 2 years ago

Name of the lexer

html lexer

Code sample

<html>
  <script>
    // <h1></h2>
  </script>
</html>

http://rouge.jneen.net/v3.26.1/html/PGh0bWw-CiAgPHNjcmlwdD4KICAgIC8vIDxoMT48L2gyPgogIDwvc2NyaXB0Pgo8L2h0bWw-

image

Additional context

jneen commented 2 years ago

Confirmed. The cause is this line: https://github.com/rouge-ruby/rouge/blob/39b6432f9546ed8cc61c14c0d8735d80b84e6fb4/lib/rouge/lexers/javascript.rb#L39

Because the parent HTML lexer has to re-examine the stream when it sees <, the javascript lexer is losing context of the comment, and the <h1></h2> is being interpreted as javascript. Fix would be to match only // and push an inline comment state that pops when it sees a newline (which we should do for any language that can be embedded tbh).

Normally we would fix this in the HTML lexer by searching for </script> eagerly - but since the ending script tag can have arbitrary whitespace in it, I think it'd be inefficient to use a lot of lookahead (not sure about this though).

jneen commented 2 years ago

(this, by the way, is the reason you'll sometimes see "</scr"+"ipt>" in js libraries - if they were embedded directly on the page without splitting that up it would end the script tag early and you'd just have a hanging quote in your js code)

jneen commented 2 years ago

While we're there we should re-examine whether <!-- really needs to be a comment in js

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had any activity for more than a year. It will be closed if no additional activity occurs within the next 14 days. If you would like this issue to remain open, please reply and let us know if the issue is still reproducible.