orbitalquark / scintillua

Scintillua enables Scintilla lexers to be written in Lua, particularly using LPeg. It can also be used as a standalone Lua library for syntax highlighting support.
https://orbitalquark.github.io/scintillua
MIT License
51 stars 20 forks source link

Markdown lexer, code_inline not closed for empty backtick pairs #115

Open jxzwp opened 2 months ago

jxzwp commented 2 months ago

Open textadept 12.4, on the menu go to Tools > Quick Open > Quickly Open Textadept Home. Select docs/api.md from the filter list and click OK to open it. Go to line 7533, which is under the heading for textadept.editing.auto_pairs. On line 7533 there's a pair of empty backticks, after them the text is styled incorrectly, as though the rest of the text is inside a code block. It's easier to see if you're using a dark theme.

The problem is in the code_inline LPEG pattern in lexers/markdown.lua on line 36. Here https://github.com/orbitalquark/scintillua/blob/0c9294695920daa397a67d960e003908c5979bb3/lexers/markdown.lua#L36 The pattern matches any number of leading backticks and then calls a function to determine where the inline code ends. However it interprets any even number sequence of empty backticks as an un-closed region of inline code. Adding an if statement to handle those edge cases fixes the issue. See the new line in the example fix below.

local code_inline = lpeg.Cmt(lpeg.C(P('`')^1), function(input, index, bt)
  -- `foo`, ``foo``, ``foo`bar``, `foo``bar` are all allowed.
  local _, e = input:find('[^`]' .. bt .. '%f[^`]', index)
  if not e and (#bt % 2 == 0) then return index end  --<<<<< New line <<<<<<
  return (e or #input) + 1
end)

When you test this, other interactions with the code_line and code_block LPEG patterns can make it difficult to determine which pattern is causing which styling, but the above fix does work as intended as far as I can tell.

Thanks for all the time you've spent developing Textadept.

orbitalquark commented 1 month ago

Sorry for the delayed response.

Thanks so much for the detailed report and potential fix.

I'm torn on this one because a `` sequence is most likely the beginning of a code sequence. What you found is a product of an HTML generation bug (there should probably be an escape of some sort). However, as you point out, the unfortunate side effect is highlighting the rest of the document incorrectly, so a fix seems reasonable here.

I'll look into fixing the HTML generation bug first. Then I may circle back to this.

For what it's worth, the Markdown lexer is ripe for another refactor. I have a number of issues with it...

jxzwp commented 1 month ago

No worries on the delayed response. Thanks for acknowledging the issue.