Consider supporting embedded language grammars when using semantic tokens

DanTup commented 1 year ago

This has been discussed a little in some other issues:

The issue is that a language uses semantic tokens, injected embedded grammars do not work. The suggestion given by @alexdima at https://github.com/microsoft/vscode/issues/113640#issuecomment-754061490 is for the embedded language to coordinate with the language server to suppress semantic tokens where this grammar needs to be used (which Rust has gone ahead with, adding an option to suppress semantic tokens on strings).

This does not seem like a very scalable solution. I had a request at https://github.com/Dart-Code/Dart-Code/issues/4212 related to this where another extension is providing highlighting of some strings inside Dart. When semantic tokens are disabled, everything is fine, but with semantic tokens enabled the Dart server produces string tokens (because strings are a non-default colour) that breaks the embedded language.

Having the Dart server suppress these tokens is not a good solution because:

It means strings that aren't in the embedded languages format would lose their colouring
It requires an LSP server (which is intended to be generic and editor-agnostic by design) to make changes for some specific functionality of another extension (of which there could be many, with varying needs)

It would be much better if this could be done without changes to the server. I don't know what a solution to this would look like, but perhaps the injected language could be allowed to layer it's scopes over the semantic tokens (while semantic tokens are more accurate, I don't believe that's a reason to prevent this), or allow the injected language to apply specifically to some tokens (like strings) from the server (though VS Code's lack of support for multiline semantic tokens may complicate that).

If there are caveats to switching to semantic tokens, it may cause languages to think twice about switching to them (or, may lead to more users turning them off) which would be a shame.

VSCodeTriageBot commented 1 year ago

This feature request is now a candidate for our backlog. The community has 60 days to upvote the issue. If it receives 20 upvotes we will move it to our backlog. If not, we will close it. To learn more about how we handle feature requests, please see our documentation.

Happy Coding!

VSCodeTriageBot commented 1 year ago

This feature request has not yet received the 20 community upvotes it takes to make to our backlog. 10 days to go. To learn more about how we handle feature requests, please see our documentation.

Happy Coding!

wakaztahir commented 6 months ago

To support semantic tokens in embedded languages

Solution 1

1 - VSCode sends my LSP server a request to get semantic tokens 2 - I lex my language and reach a token for an embedded language 3 - I set a field in this semantic token to indicate embedded language start & length and which embedded language is being used

Cons : 1 - This means vscode needs to go through my semantic tokens, find the embedded language and use tokens from its own set of extensions or lsp servers 2 - lsp server might need to be started to provide semantic tokens for embedded language

Solution 2

1 - VSCode sends my LSP server a request to get semantic tokens 2 - I lex my language and when I reach a token for an embedded language 3 - I send a request back to vscode to get tokens for an embedded language (two way semanticTokens/range) 4 - vscode provides me the semantic tokens, I might need to parse these because the format is different, I add these tokens to my tokens and provide it to vscode

Cons :

1 - Harder to implement, when sending tokens, they are compressed, vscode must not compress them, when sending to server 2 - still requires lsp server to be started to provide semantic tokens for embedded language 3 - this approach is worse than approach above

The biggest problem

I don't just need semantic tokens support for embedded language, I also need support for completions & all that.

microsoft / vscode