Rascal LSP uses textmate token scopes where it should use SemanticTokenTypes for the syntax highlighting.

DavyLandman commented 6 months ago

Describe the bug

Rascal LSP is reporting the wrong kind of semantic token types in the legend. This makes it a hit and miss which ones are properly highlighted.

To Reproduce

Annotate a production with @Category{constant.numeric} and it doesn't get the colors of numeric contants of the theme, while string as category does work.

Analysis

I think the documentation on this feature has been improved, it used to be less clear what LSP expected, I've read some old GH issues discussing the closed aspect of these TokenTypes. The set of valid token type is limited to the enum mentioned in the LSP spec, which is a subset of the API in VS Code

Now why did it even work? We were passing sublime scopes like entity.name and variable.function. But VS Code is ignoring after the . (as the syntax is actually <tokentype>.<modifier> in VS Code, no LSP). So if the first name happened to be in the list (or we used something like string) it accidentally worked. But constant.numeric (or constant) doesn't exist as a TokenType, so it gets no highlighting applied. A user cannot just do @Category{number} instead, as rascal-lsp has a hardcoded list of token types, and number is not in there.

Solution

replace the list with the enum form the LSP spec.
consider adding a mapping of the old text-mate styles to the token types, so that existing languages keep working a reasonable manner.

jurgenvinju commented 6 months ago

Ouch! Good analysis. How about this:

Just like with the Symbols for the outliners, we model the LSP API directly and completely:
- We add a data TokenType = enum(TokenModifier modifiers...) | class((TokenModifier modifiers...) | .... ; (note the variable argument lists), but we also add a default token type if it's not there already.
- Using this other "enum" for the modifiers data TokenModifier = static() | ...
We implement @tokenType=class(static()) on syntax rules similar to the way we used to implemented @category
We support @category="string" by mapping selected textmate categories (and the old Rascal-eclipse) categories to the above, but label the uses "deprecated".
We also implement data Tree(TokenType tokenType = default(), that follows the semantics of the category keyword field.

This way we fix the bug and also keep in line with the rest of the LanguageServer design, and we stay backward compatible for a while.

DavyLandman commented 6 months ago

That is cool, but it does mean you have to import the util::LanguageServer module into your Syntax file, which is a dependency that we normally don't want (since we like to also be able to run our stuff without LSP dependencies).

I was currently working on this:

use VS Code's syntax of <token type>{"." <token modifier>}* but keep them strings (and point towards the LSP documentation in the documentation of the parser function)
map as much of the old rascal & textmate tokens towards the LSP token types

Pros:

keep in the same style as before, without introducing extra dependencies
Add support for modifiers
Be a relative small patch to the existing code.

Cons:

the list of strings is external, and even if we would make a copy, it would still introduce less than desired dependency edges in or grammar modules
it's a string. So users can still mess up the token & modifiers, and we have no type-checking time support for these errors.

jurgenvinju commented 6 months ago

Watch out... next to the standard types and modifiers, there is more:

Along with the standard types and modifiers, VS Code defines a mapping of types and modifiers to similar TextMate scopes. That's covered in the section Semantic Token Scope Map.

DavyLandman commented 6 months ago

Watch out... next to the standard types and modifiers, there is more:

Along with the standard types and modifiers, VS Code defines a mapping of types and modifiers to similar TextMate scopes. That's covered in the section Semantic Token Scope Map.

Yes, but that is not LSP.

That's VS Code that is helping out map semantic token types to textmate scopes for the themes that do not have extensive semantic token support.

jurgenvinju commented 6 months ago

Also Custom TextMate scope mappings can be added.

We could use these to support the old Rascal-eclipse tokenType names, such as "ambiguity", etc. They can be added to the json configuration file and do not have to be directly associated with any language.

DavyLandman commented 6 months ago

Also Custom TextMate scope mappings can be added.

We could use these to support the old Rascal-eclipse tokenType names, such as "ambiguity", etc. They can be added to the json configuration file and do not have to be directly associated with any language.

Indeed, but again, this is a VS Code specific extension. LSP does not want us to do this. VS Code API of semantic tokens is not the same as LSP api of semantic tokens. By design they've limited the categories.

But I was also thinking, that yes, for the ambiguity nodes, we could try and sneak one in, that just won't work for other LSP clients.

jurgenvinju commented 6 months ago

I see your points, but I want to work towards a more ideal situation.

Syntax highlighting is a core feature of the Rascal game, in Eclipse, in VScode but also elsewhere in other LSP instances or even on the commandline. I don't like the strings since they lead to the kind of bugs that we are now fixing here. Things happily seem to work, but then they don't. Our users have to find this out.

So I guess I am proposing an extension to ParseTree.rsc where we add highlighting to the core of the parse tree representation:

We stick to names that fit the Rascal world, like SyntaxCategory and SyntaxCategoryModifier but use the LSP set as the standard set of constructors.
We write the implementation in Java in such a way that user extensions of Category will work automatically, if they also added the same name to their own JSON files. "Ambiguity" will serve as an example extension.
Everybody who writes a syntax rule, automatically has ParseTree imported by the type-checker, so that problem will be solved.
We implement @category=str for backward compatibility and @category=SyntaxCategory for the new style.
We change/fix other uses of @category and .category (such as in HTML generators and ANSI pretty printers)
We roll this out in rascal, rascal-lsp and rascal-eclipse (later)

jurgenvinju commented 6 months ago

Note that tags are not checked by the type checker (yet). Right @PaulKlint ?

PaulKlint commented 6 months ago

If you mean typechecker (instead of prettyprinter): I actually fixed that yesterday!

jurgenvinju commented 6 months ago

type checker! duh.. where did that come from? Oh that's cool. That will help us find a few bugs!

jurgenvinju commented 6 months ago

@DavyLandman but that means that @category needs a new name because we don't have union types in Rascal. So something like @scope for the new ones? tag SyntaxScope = syntax@scope and data Tree(SyntaxScope scope=default())

DavyLandman commented 6 months ago

Ok, so I like your idea @jurgenvinju. But since we're running into this issue at our customers and rascal is in the middle of a lot of shakeup, how about we split the fix in 2 phase.

Add support for the LSP native ones to the list of supported token types, so at least a user can say @Category=number and get the proper syntax highlighting (this is a much smaller change than what I proposed in #367 )
In parallel properly implement your extension to Parsetree.

So apart from this scheduling proposal, I have a small comment on the design proposed: that the LSP categories are quite limited, they designed it to force people into a small set of options that can reliably be mapped to a theme, but things we would consider normal is missing. Like for example there are only classes, interface & structures. And a lot of things will have to map to keyword.

Also, I'm not sure if scope is the best name? It alludes to both scope in language semantics, which it's not, and to text-mate scopes, which they also are not (i.e no nesting etc).

DavyLandman commented 6 months ago

Or alternative point 1: do fix the name mapping part, but drop the modifier support. So in essence: fix the bug of how category was interpreted, without adding new functionality.

I think that would be my preference. I've updated #367 with these changes.

DavyLandman commented 6 months ago

After some offline discussion with Jurgen, we came to the conclusion that #367 fixes the current bug, while usethesource/rascal#1928 will give the proper support for this. Therefore we'll keep #367 "small"

usethesource / rascal-language-servers

Rascal LSP uses textmate token scopes where it should use SemanticTokenTypes for the syntax highlighting. #366