microsoft / vscode

Visual Studio Code
https://code.visualstudio.com
MIT License
164.82k stars 29.49k forks source link

Textmate grammar highlighting doesn't match GitHub/Linguist when "end" is never matched #189940

Closed DanTup closed 1 month ago

DanTup commented 1 year ago

This was reported at https://github.com/dart-lang/dart-syntax-highlight/issues/11#issuecomment-1613758553. Dart highlighting on GitHub doesn't handle unterminated triple-backticks as expected. VS Code does handle it as expected.

However, while debugging this, I've become less certain that GitHub is wrong, and feel like VS Code might be.

Here's a trimmed down version of the grammar that shows the problem. It defines triple-slash comments, and supports triple-backtick code blocks inside:

It renders like this:

image

It the triple backticks are unclosed, it looks reasonable:

image

However, it's not clear why the variable.other.source.dart scope was exited, because the "end" condition was never found. On GitHub, this does not happen and the rest of the document is consumed (note the first void here is red, but the second one is not because the variable context eats the rest of the document):

image

I can't find anything in the spec for textmate grammars to explain VS Code's behaviour. The most information I've found on it is here:

https://macromates.com/manual/en/language_grammars

The other type of match is the one used by the second rule (lines 9-17). Here two regular expressions are given using the begin and end keys. [...] If there is no match for the end pattern, the end of the document is used.

https://www.apeth.com/nonblog/stories/textmatebundle.html

With begin/end, if the end pattern is not found, the overall match does not fail: rather, once the begin pattern is matched, the overall match runs to the end pattern or to the end of the document, whichever comes first.

While VS Code's behaviour is convenient for me (because I'm not sure how to handle these unclosed triple-backticks if it behaved like GitHub), it doesn't seem correct, and it's more inconvenient if VS Code and GitHub disagree on what the behaviour should be because it makes it more difficult to author a grammar.

aiday-mar commented 1 year ago

Hey @hediet, not sure if this is for you?

RedCMD commented 3 months ago

I would say that VSCode is correct and Linguist is wrong Linguist seems to just error and self implode

begin/while is a lot more ridgid/restrictive than end it cannot be 'pushed' out like how end can be

I believe while was designed for embedded languages that would have otherwise broken out of end

https://github.com/RedCMD/TmLanguage-Syntax-Highlighter/blob/main/documentation/rules.md#while

DanTup commented 3 months ago

@RedCMD I'm not sure I understand, the issue here appears to be with begin/end and not while? The first pattern in my example begins with triple backticks and ends with triple backticks, but in my second screenshot there is no second set of triple backticks, so why does the variable.other.source.dart end and not run to the end of the document?

(I'm not complaining, because with the current grammar this is better behaviour than I see on GitHub, but I understand why it would be correct)

RedCMD commented 3 months ago

the begin/end is inside of the begin/while while is a lot more strict and won't let the begin/end escape

the while is currently checking every line if it starts with /// the last line foo, does not start with ///, so the while finishes and all patterns inside are terminated this was done so embedded languages would stop escaping their bounds and polluting the rest of the document

DanTup commented 3 months ago

Oh, I see! That does sound sensible, so maybe the issue here is GitHub then. I've opened an issue at https://github.com/github-linguist/linguist/issues/7015 (assuming that's the right place) for their take on it. Thanks!

hediet commented 3 months ago

Closing because it does not seem to be a VS Code issue.

It is unfortunate that the TextMate language is not specified very precisely.

DanTup commented 2 months ago

@hediet seems like you didn't actually close this. However, I've been unable to determine whether VS Code is actually correct here. I got a response at https://github.com/github-linguist/linguist/issues/7015#issuecomment-2321028904 suggesting maybe VS Code is incorrect:

PrettyLights's behaviour appears to be correct here

I don't know how reasonable it is to say one implementation is correct if the spec is vague, however I do think it's in everyones best interest if we can agree on a (the same) definition (in absense of a real spec/documentation). However, this would require some agreement between the big implementations (of which VS Code, and GitHub/PrettyLights are some).

RedCMD commented 2 months ago

after spending way too much time trying to get a stupid Mac to boot I was able to test the grammar in TextMate 2.0

and I can confirm GitHub/Linguist/PrettyLights is correct ✅ and VSCode is wrong

image image

TextMate 2.0 allows the end rule to 'push' the while rule outwards which allows for interesting things to happen: image In my opinion, this makes while almost functionally useless

A \G anchor is also placed at the beginning of the nextline after the begin and while which VSCode does not do ❌

This issue should now be moved to https://github.com/microsoft/vscode-textmate

DanTup commented 2 months ago

@RedCMD thanks for digging more into this! If VS Code agrees and will fix this, then I guess I will have to update the Dart grammar (I feel like it's going to get a lot more complicated!).

I also feel like this behaviour isn't ideal, but given TextMate isn't likely to change, I do think it's more important the implementations match.

DanTup commented 1 month ago

I've re-raised this at https://github.com/microsoft/vscode-textmate/issues/241