jgm / skylighting

A Haskell syntax highlighting library with tokenizers derived from KDE syntax highlighting descriptions
189 stars 61 forks source link

Lua multiline comments are never terminated #99

Closed tarleb closed 4 years ago

tarleb commented 4 years ago

Lua allows multiline comments like so --[[comment, may include line breaks]]. It seems that skylighting somehow differs from kate in processing these.

E.g., running

$ skylighting --syntax=lua -fnative <<EOF
--[[hello]]
local variable = 23
EOF

gives

[ [ ( CommentTok , "--[[hello]]" ) ]
, [ ( CommentTok , "local variable = 23" ) ]
]

while kate highlights the second line as keywords and variables. I verified that the used lua.xml only differ in the small change made in lua.xml.patch, which seems unrelated to this behavior.

jgm commented 4 years ago

Try it with --trace and you'll get a lot of information about the sequence of parsing. That might help narrow down what is happening.

Sometimes these issues have to do with subtle differences between pcre and the KDE regex engine, which we then have to work around. You can also inspect the xml file to try to figure out which part is responsible for ending the multiline comment.

tarleb commented 4 years ago

Thanks, that helped a lot. Trace confirms that the rule which should pop the "BlockComment" context never fires. I played around with the definitions a little with the respective rule in doxygenlua.xml and found that changing ]%1] to ]] results in the expected output. Of course this would break if one would use the collision prevention syntax with arbitrary numbers of =, like --[=[ comment ]=] instead.

Unfortunately, I don't understand yet the code enough to see where any of this is handled.

tarleb commented 4 years ago

Ah, think I understand: the linked line uses StringDetect with dynamic=true, which does not seem to be supported in Skylighting. In fact, changing StringDetect to RegExpr in the line linked above fixes the issue entirely.

jgm commented 4 years ago

I didn't actually know that StringDetect could be used with dynamic=true. We could add support for that without too much difficulty, I think.

jgm commented 4 years ago

If you want to take a look: skylighting-core, Skylighting/Tokenizer.hs. As you'll see DetectChar takes a flag for dynamic, but StrictDetect doesn't.

tarleb commented 4 years ago

Thanks, I created a PR. Some tests changed, but from what I can see, this only shows that things are working as intended.