Support semantic highlighting

hackwaly commented 8 years ago

Like WebStorm and VS does: eg. Symbol is a type or parameter or namespace or unresolved ...

Textmate based grammars are hard to do this. Since we did support Diagnostics, why not support semantic highlighting?

dbaeumer commented 4 years ago

@kjeremy for the content of a file it is enough. If you have inter file dependencies and you know that semantic coloring (or any other feature) invalidates file A if file B changes then your server needs to model that dependency tree. But there timestamps don't help either.

axelson commented 4 years ago

As you can see with the new proposed protocol there is now a format and capabilities to indicate what requests the client will sent.

Is there a link where the updated protocol can be reviewed? Is it still primarily in the typescript server?

dbaeumer commented 4 years ago

@axelson the proposed version is here in terms of implementation. I am in the process of writing the markdown. https://github.com/microsoft/vscode-languageserver-node/blob/master/protocol/src/common/protocol.semanticTokens.proposed.ts#L1

dbaeumer commented 4 years ago

And here is a first version of the spec. No word polish and no spell check :-)

https://microsoft.github.io/language-server-protocol/specifications/specification-3-16/#textDocument_semanticTokens

kjeremy commented 4 years ago

@dbaeumer When will we need to move our LSP server implementation over to this one from the current semantic tokens implementation supported in vscode?

dbaeumer commented 4 years ago

@kjeremy Are you relying on the implementation in the next version of the LSP libs ?

kjeremy commented 4 years ago

@dbaeumer Our server-side implementation is in rust and based on https://github.com/gluon-lang/lsp-types which will need to be updated. The client-side opts in via https://github.com/rust-analyzer/rust-analyzer/blob/master/editors/code/src/client.ts#L151

ghost commented 4 years ago

@dbaeumer I really think a remark that clients are expected to cache locally with ranges would be quite important unless you want to make deltas de-facto required for servers. Right now, I can't see such a remark, it should probably go into "Implementation considerations." (Unless you don't think it should be optional @ deltas, but you did sound like you wanted to keep it in but as optional.)

If you don't write that in, some clients will just not do such caching and then servers will be required to implement deltas to guarantee good performance or users will blame it on them, making deltas basically a must.

Edit: also, right now the general concepts part is mixed with the pretty specific integer encoding in one block, I find that this is suboptimal for readability.

Suggestion in detail to split up text:

I think the General Concepts section would benefit from the integer encoding part split into a separate section placed separately. In detail, I am suggesting to split at the following start/end points:

Split start at:

On the capability level types and modifiers are defined using strings. However the real encoding happens using numbers to save bandwidth. [SPLIT HERE]

I would start to cut out parts for a new section after this, named Integer Encoding for Tokens, which contains the text up to BEFORE this paragraph:

Split end at:

[SPLIT HERE] The protocol defines an additional token format capability to allow future extensions of the format. The only format that is currently specified is relative expressing that the tokens are described using relative positions.

So the new General Concepts starting section would go like:

[... General Concepts as before ...] On the capability level types and modifiers are defined using strings. However the real encoding happens using numbers to save bandwidth, see Integer Encoding (link to section). The protocol also defines an additional token format capability to allow future extensions of the format. The only format that is currently specified is relative expressing that the tokens are described using relative positions. [... General Concepts ending parts as before...]

... with the Integer Encoding for Tokens section separated out to below based on above cut out starting with:

The server therefore needs to let the client know which numbers it is using for which types and modifiers. They do so using a legend, which is defined as follows: [...remaining Integer Encoding concepts up to split end...]

This new section could be additionally prefixed with an introducing sentence like: This section describes the integer encoding used for data transfer.

kjeremy commented 4 years ago

@dbaeumer

I'm trying to figure out the delta behavior and have some questions:

Does the server fill in the resultId for the range request? That seems odd to me (I'm not sure what I would do with that information).
For the delta I am holding onto all the tokens for the file and computing a diff against that and returning it like how SemanticTokensBuilder works. Does the resultId returned from the delta request represent the state of ALL the tokens in the file?

I guess ultimately I'm trying to figure out what I need to hold onto to compute the deltas. I am assuming that each deltas really asks for the delta between "now" and the previous delta or full request. It does not appear to be spec'd that way however and it could be that a client asks for a delta between "now" and many revisions ago.

puremourning commented 4 years ago

@dbaeumer i'm reading through the initial draft spec (thanks for writing it up!). I have a handful of comments/questions/clarifications. What's the best way to provide feedback? would you like comments on a PR or something like that?

dbaeumer commented 4 years ago

@puremourning I think the comments are best provided as a PR.

dbaeumer commented 4 years ago

@kjeremy you are correct a delta for range makes no sense. I am pretty sure VS Code will basically drop a previous result ID.

For the delta is should be always reported against the last result independent whether this was a full or a delta response. So in an easy implementation the id is simply incremented. I have clarified this in the spec.

dbaeumer commented 4 years ago

@etc0de thanks for the suggestions. I made them in the 3.16 version of the spec.

MarFren commented 4 years ago

Regarding #18 (comment)

The idea is that the client pulls for semantic tokens for all visible files. The one not having the focus should be pulled with less frequency. Assuming that the server implements the delta mechanism a pull for a file that has no semantic color changes will not produce any additional payload. This model was chosen to not sync any UI state. Otherwise the server might do an expensive impact analysis about which files need to refresh semantic tokens although non of them are visible in the UI.

If I understand it correctly, the basic idea is to let the client pull semantic tokens for all open files on a regular basis, the one having the focus with high frequency, the other files with low frequency (but automatically). That would allow for updating the semantic highlighting in all open files, even if file A has the focus and a change here does induce a change in highlighting in another file B - am I correct? If so, the current reference implementation in VSCode Insiders (1,49.0-insiders) does not implement automated pulls of semantic tokens of open files, right? Wouldn't it be good to describe the expected pull behavior in the 3.16 specs of LSP?

kaby76 commented 4 years ago

Now that my LSP server is more stable, I decided to update the extension for VSCode and give the semantic highlighting feature a spin. On the server, I decided to first try out SemanticTokensOptions { range = false, full = true}. However, I see the following:

[Error - 8:40:58 PM] Request textDocument/semanticTokens failed.
  Message: No method by the name 'textDocument/semanticTokens' is found.
  Code: -32601

Well, yeah, that's right. There is no "textDocument/semanticTokens" message because it only mentions in the spec "textDocument/semanticTokens/range", "textDocument/semanticTokens/full", and "textDocument/semanticTokens/full/delta", but no "textDocument/semanticTokens". Is this spec up to date? I can debug VSCode, figure out what is it expecting, and program to the implementation, but it would be nice to know what's going on.

DanTup commented 4 years ago

@kaby76 I think the request is constructed by the vscode-languageclient package so the issue might be that the extension is not using the latest version of that package (7.0.0-next.9 is the one I've been using to test in the Dart extension and it seems to work well - though I've only implemented /full so far).

dbaeumer commented 4 years ago

@MarFren adding this as a recommendation to the spec makes sense. A PR is welcome. However the LSP spec never enforces this. If a client decides to not do this it should still be fine.

DanTup commented 4 years ago

@dbaeumer in the spec for semanticToken/range it says the result is:

result: SemanticTokens | null where SemanticTokensDelta

It seems like the line is incomplete (or the last part shouldn't be there).

If the return value is SemanticTokens, is the lineDelta from the range that was requested, or from the start of the document? (I couldn't find this mentioned anywhere).

kaby76 commented 4 years ago

@DanTup That worked. I had to also set up a few other dependencies

    "vscode-jsonrpc": "^6.0.0-next.5",
    "vscode-languageclient": "^7.0.0-next.9",
    "vscode-languageserver": "^7.0.0-next.7",
    "vscode-languageserver-protocol": "^3.16.0-next.7",
    "vscode-languageserver-types": "^3.16.0-next.3"

use the "Insiders" VSCode, and change an import because LanguageClient was moved around:

import * as vscodelc from 'vscode-languageclient/node';

But, "textDocument/semanticTokens/full" starting to work. Computing the start line/col deltas a little challenging. And VSCode does not work the same as the LSP client in VS2019 with edits.

DanTup commented 4 years ago

I had to make the /node import changes, though I don't think I needed to use insiders (nor add the other dependencies).

Computing the start line/col deltas a little challenging

I've been refactoring my server work to collect the tokens using absolute data initially (line/cols, enum types) and then at the end do the conversion to the LSP format. This made it much simpler than when I was also doing things like splitting up multiline/nested tokens at the same time. The final conversion is now relatively simple:

var lastLine = 0;
var lastColumn = 0;

_tokens.sort(
  (t1, t2) => t1.line == t2.line
      ? t1.column.compareTo(t2.column)
      : t1.line.compareTo(t2.line),
);

for (final token in _tokens) {
  var relativeLine = token.line - lastLine;
  var relativeColumn = relativeLine == 0
      ? token.column - lastColumn
      : token.column;

  encodedTokens.addAll([
    relativeLine,
    relativeColumn,
    token.length,
    semanticTokenLegend.indexForType(token.type),
    semanticTokenLegend.bitmaskForModifiers(token.modifiers) ?? 0
  ]);

  lastLine = token.line;
  lastColumn = token.column;
}

kaby76 commented 4 years ago

@DanTup Yes, my code looks more or less just like your code after I noticed initially that only the first symbol was being colored. I tried out some hardwired values, then understood what had to be done (i.e., sort + compute diffs).

dbaeumer commented 3 years ago

I will close the issue since SC is now part of the upcoming 3.16 spec.

microsoft / language-server-protocol

Support semantic highlighting #18

Suggestion in detail to split up text: