Support semantic highlighting

hackwaly commented 8 years ago

Like WebStorm and VS does: eg. Symbol is a type or parameter or namespace or unresolved ...

Textmate based grammars are hard to do this. Since we did support Diagnostics, why not support semantic highlighting?

dbaeumer commented 8 years ago

Fully agree and we already discussed it since we would like to have it in VS Code as well.

daviwil commented 8 years ago

I'd love to have this for the PowerShell extension also. We provide the ability to create "dynamic keywords" for the purpose of writing domain-specific languages. Semantic highlighting would allow us to colorize those keywords in VS Code even though they aren't part of the PowerShell language spec.

/cc @BrucePay

smarr commented 8 years ago

I was looking for that as well.

What's exactly the meaning of the current 'highlighting' for read/write/text (see document highlights)?

I implemented that part but didn't see any for of visual feedback in the editor.

cdietrich commented 8 years ago

DocumentHighlight is for "mark occurrences"

smarr commented 8 years ago

@cdietrich sorry for hijacking the topic, but this is very unclear to me. How is document highlights used to realize a 'mark occurrences" feature? How does the read/write/text distinction fit in, and why does it only expect a single result in return? For mark occurrences, I'd expect to be able to return a collection, no?

Would be great if that could be clarified also in protocol.md.

cdietrich commented 8 years ago

@smarr yes you are right. this makes no sense. i asume the return type should be an array. if i have a look at vscode i can find

export interface DocumentHighlightProvider {
        provideDocumentHighlights(model: editor.IReadOnlyModel, position: Position, token: CancellationToken): DocumentHighlight[] | Thenable<DocumentHighlight[]>;
    }

thus looks like a bug in the protocol

smarr commented 8 years ago

@cdietrich thanks, I'll open a separate issue.

svenefftinge commented 7 years ago

Please find a proposal in PR #124

jpike88 commented 6 years ago

I’m guessing this is dead... anyone got any updates?

svenefftinge commented 6 years ago

I am working on a new proposal for semantic coloring.

rcjsuen commented 4 years ago

There is a proposal in the microsoft/vscode-languageserver-node repository now that I suggest others take a look at when they get a chance.

@dbaeumer Suggest to close #513 in favour of this or the other way around.

axelson commented 4 years ago

VSCode 1.43 now supports semantic highlighting via the language server for TypeScript: https://code.visualstudio.com/updates/v1_43#_typescript-semantic-highlighting

I think it's through proposed updates for the language server protocol 3.16?

kaby76 commented 4 years ago

Can someone provide a link to the proposed 3.16 LSP spec, which includes "semantic highlighting"? I cannot find it in the usual location. I've read the wiki page on Semantic Highlighting Overview for VSCode, and while I am pleased with the "General Tokens Provider API" because it fits nicely into the current VSIDE (aka VS2019) ITagger<> and IClassificationFormatMap framework, I am also concerned over the Token Classification portion. I remind folks that symbols in programming languages like Antlr, Bison, and other parser generator languages, do not fit into these categories at all. Also, it is worth noting that grammars can contain target-language code blocks, so it is not a homogenous language used. I am hoping that the provider just return an integer of an equivalence class that the client then maps into the client UI format via a user specified mapping. The client does not need to know the meaning of the equivalence class of the symbol, i.e., whether it is "variable", "type", "enum", "non terminal" or "terminal". However, the user must understand what it is. For Antlr LSP, I already provide a judicious mapping of the equivalence class of the symbol into a SymbolKind--an integer--for Document Symbols, disregarding the meaning of the SymbolKind (e.g., "type"), and use that for CFG/static semantics colorization via a custom message. (Note, in VSIDE, SymbolKind in the LSP client that MS implemented also takes on additional services, like "go to def" and "find all refs", so it was done with care, experimentally, so things like comments do not get populated in drop-down boxes.) In VSIDE, scrolling only requests tagging for a small range of text that is going to be displayed. It is quite fast and works well. I never implemented a TextMate color tagger because Antlr syntax cannot be described using regular expressions.

rcjsuen commented 4 years ago

Can someone provide a link to the proposed 3.16 LSP spec, which includes "semantic highlighting"?

I don't think it's been written up yet. There is this TypeScript file from the microsoft/vscode-languageserver-node repository that you can do some reading up on.

ell1e commented 4 years ago

Interesting, isn't it in practical use already? Would be useful to allow other editor/IDEs and LSPs to catch up if that is the case, preferably without everyone just guessing / making up the specs based on the code by themselves.

razzeee commented 4 years ago

It's in use by some extensions, but it's still in proposal as far as I know. The lsp implementation will even needs to get some updates if you can trust https://github.com/microsoft/vscode/issues/95168 So it's not done it seems.

dbaeumer commented 4 years ago

That is correct. The API is not final yet hence it is still in proposed state (and will be there until we ship 3.16)

ghost commented 4 years ago

Ok. But forgive me if this is a stupid question, but wouldn't it be better to get some early feedback on how it could look like before investing all the effort into a complete implementation in VS Code? It sounds like now would be a good time to collect some feedback while the implementation isn't fully set in stone yet

rcjsuen commented 4 years ago

@etc0de Some of us have implemented it and there was a bit of back and forth on the VS Code API (https://github.com/microsoft/vscode/issues/86415) which does influence the LSP API. I didn’t have much issues implementing this feature but Dockerfiles aren’t exactly the most complicated language semantics-wise. :)

Of course, I do agree in that it would be great if there were more implementations so there can be more feedback about the feature.

ghost commented 4 years ago

Well the thing is, all you linked is implementation code but I don't even know how the actual protocol looks like like right now. I just found this in one of your links, but this is about vs code settings: https://github.com/microsoft/vscode/wiki/Semantic-Highlighting-Overview

Hasn't anybody sat down and done a quick writing down of the actual JSON things sent back and forth?

I feel like this is what feedback would be useful on right now, because I'm not talking end user feedback (this is the LSP repo after all, not the VS code repo) but other language server authors' feedback whether the protocol even works for them and their implementations. This seems to be unclear right now, most of the links floating around here look like they're already for end users to test or complete plugins for use as-is. I don't care about the VS code API, I want to know if the actual underlying protocol is reasonable from the language server side.

It just feels like VS Code is already locking in their implementation when not many have really seen the underlying protocol yet, that seems risky to me.

Also, as a side note, has anybody considered deprecating documenthighlight and renaming it to expressionhighlight, occurrencehighlight, or something different in a newer protocol draft? The "document" part really makes it sound like it semantically highlights the entire document, so I feel like the naming here is somewhat unfortunate. Edit: I guess it's named like that for consistency with DocumentSymbol etc. Then what about DocumentOccurrenceHighlight or something, or is that too long? Oh well, maybe renaming it isn't ideal... it was just an idea

rcjsuen commented 4 years ago

Well the thing is, all you linked is implementation code but I don't even know how the actual protocol looks like like right now.

@etc0de The TypeScript file I linked to is mostly interfaces (and not necessarily implementation code) but it is certainly fair to say that it is not exactly very reader-friendly.

Hasn't anybody sat down and done a quick writing down of the actual JSON things sent back and forth?

In terms of the payload of the integer array, that is explained in the API documentation for the provideDocumentSemanticTokens function in VS Code which was linked to from the aforementioned TypeScript file. But as per the above, I agree that it is certainly not super obvious and it would be nice if it was presented in a more readable format.

I feel like this is what feedback would be useful on right now, because I'm not talking end user feedback (this is the LSP repo after all, not the VS code repo) but other language server authors' feedback whether the protocol even works for them and their implementations. This seems to be unclear right now, most of the links floating around here look like they're already for end users to test or complete plugins for use as-is. I don't care about the VS code API, I want to know if the actual underlying protocol is reasonable from the language server side.

Then I suggest you take a look at the comments from the last few months in https://github.com/microsoft/vscode/issues/86415. Comments like https://github.com/microsoft/vscode/issues/86415#issuecomment-573934661, https://github.com/microsoft/vscode/issues/86415#issuecomment-587184889, and https://github.com/microsoft/vscode/issues/86415#issuecomment-596143316 are all questions posed by from the perspective of a language server author. Although the questions were posed in the VS Code repository, because the proposed LSP API of semantic highlighting practically maps 1-to-1 to the VS Code API the questions are still of relevance.

Do I kind fo wish those questions were linked to from or posted in this issue instead? Sure.

It just feels like VS Code is already locking in their implementation when not many have really seen the underlying protocol yet, that seems risky to me.

I can see that.

Also, as a side note, has anybody considered deprecating documenthighlight and renaming it to expressionhighlight, occurrencehighlight, or something different in a newer protocol draft? The "document" part really makes it sound like it semantically highlights the entire document, so I feel like the naming here is somewhat unfortunate. Edit: I guess it's named like that for consistency with DocumentSymbol etc. Then what about DocumentOccurrenceHighlight or something, or is that too long? Oh well, maybe renaming it isn't ideal... it was just an idea

This is probably the first I've heard of such a request. I have never thought of it that way but from how you are describing it I can see how it can be confusing now that I look at things that way.

ghost commented 4 years ago

The TypeScript file I linked to is mostly interfaces (and not necessarily implementation code)

I figured, but I honestly can't tell to what JSON they map. What does a SemanticTokensBuilder with a push do? What is a provideDocumentSemanticTokensEdits? entry in JSON? Is that a type, a name of some complex object member, a boolean, ...? It is honestly quite unreadable if you're not familiar with typescript.

IMHO it's a bad precedent to just assume anyone using this standard is familiar with typescript, I think it's also bad to drop an implementation as a spec draft. But maybe that's just me :woman_shrugging: I wish I could give you feedback, I'm certainly interested, but in the current format I'm afraid that is hard to do

Then I suggest you take a look at the comments from the last few months

The discussion is interesting, but it doesn't really solve that the "draft" itself in it's current form isn't really in a universally readable form. It's basically a closed off playground for typescript devs only, apparently...? I don't know, I do kind of find this approach bewildering. I really would suggest this approach is changed.

Edit: to be fair, the protocol.semanticTokens.proposed.ts is a bit more readable now that I looked over it. It was mostly the semantic-tokens-sample/vscode.proposed.d.t that threw me off. Still, I think using typescript code for a spec draft is far from ideal.

rcjsuen commented 4 years ago

The discussion is interesting, but it doesn't really solve that the "draft" itself in it's current form isn't really in a universally readable form. It's basically a closed off playground for typescript devs only, apparently...? I don't know, I do kind of find this approach bewildering. I really would suggest this approach is changed.

The other proposed API at the moment, the textDocument/callHierarchy request does at least have a Markdown file written up about it and I can recall other instances in the past where a similar Markdown file was written. I am not sure why one wasn't written this time.

razzeee commented 4 years ago

For the sake of completeness, the F# plugin Ionide also already implements it and also shipped it. @Krzysztof-Cieslak

Krzysztof-Cieslak commented 4 years ago

We created custom LSP endpoint and wrapped stuff on VSCode side, it’s not really using proposed API in LSP spec

axelson commented 4 years ago

I have a question that I haven't seen an answer to. Can the semantic highlighting be additive with the existing textmate grammars? Or is it completely one or the other?

rcjsuen commented 4 years ago

Can the semantic highlighting be additive with the existing textmate grammars? Or is it completely one or the other?

I don’t believe this has been spec’d out. I encountered this in VS Code (https://github.com/microsoft/vscode-languageserver-node/issues/570 ) and they do a merge of sorts there.

Krzysztof-Cieslak commented 4 years ago

Semantic highlighting in VSCode is additive. VSCode first runs normal text mate highlighting and then improves it when semantic results are available.

But I’d imagine it may differ from client to client, it seems like a client implementation detail.

kjeremy commented 4 years ago

I implemented the LSP portion of semantic highlighting for the rust-analyzer server and it was pretty straightforward.

The other proposed API at the moment, the textDocument/callHierarchy request does at least have a Markdown file written up about it and I can recall other instances in the past where a similar Markdown file was written. I am not sure why one wasn't written this time.

FYI that file does not match the protocol. I was hoping to update it if https://github.com/microsoft/vscode-languageserver-node/pull/614 looks right.

paulyoung commented 4 years ago

I think @etc0de has brought up some very good points and I too would like to see a draft of the spec in a similar form to what would eventually appear on the website.

paulyoung commented 4 years ago

I think @etc0de has brought up some very good points and I too would like to see a draft of the spec in a similar form to what would eventually appear on the website.

rcjsuen commented 4 years ago

The other proposed API at the moment, the textDocument/callHierarchy request does at least have a Markdown file written up about it and I can recall other instances in the past where a similar Markdown file was written. I am not sure why one wasn't written this time.

FYI that file does not match the protocol. I was hoping to update it if microsoft/vscode-languageserver-node#614 looks right.

:( Well, that's unfortunate. Thanks for pointing this out, @kjeremy!

axelson commented 4 years ago

@rcjsuen @Krzysztof-Cieslak thanks for the info and link. It would be great to see the additive (or not) behavior formally spec'd out (perhaps in a new .md that has been proposed). As a Language Server author I like the idea of semantic highlighting being additive since it will make it easier to adopt since initially I could just add support for the few tokens that have recursive definitions (or are otherwise hard to define without using negative lookbehinds or other complex tools).

HighCommander4 commented 4 years ago

FYI that file does not match the protocol. I was hoping to update it if microsoft/vscode-languageserver-node#614 looks right.

:( Well, that's unfortunate. Thanks for pointing this out, @kjeremy!

I have pointed this out a while ago :)

rcjsuen commented 4 years ago

@rcjsuen @Krzysztof-Cieslak thanks for the info and link. It would be great to see the additive (or not) behavior formally spec'd out (perhaps in a new .md that has been proposed).

I'm all in favour of more things being explicitly stated. 👍

ghost commented 4 years ago

For what it's worth, I left minor other points here: https://github.com/microsoft/vscode/issues/86415#issuecomment-619350479

It would be great to see the additive (or not) behavior formally spec'd out

For that I would like to request that textmate supports isn't a requirement in a client: my suggestion would be to make the spec say "if the client supports other sources of highlight formatting options like textmate, it should add them in additively unless the LSP server specifies option XYZ to indicate an intended exclusive handling of semantic highlighting" or something like that.

The reason for textmate being optional: I pondered writing a small mini-IDE for my language, and if I do I plan to just support LSP-backed highlighting only because that is all I personally need but I'd want to at least possibly allow other LSPs to be plugged in too. It'd be nice if I then wouldn't necessarily be violating the specs just because of a more spartanic not-textmate-supporting editor implementation.

The reason for LSP server being able to override textmate with a protocol option: I am all for editors that have "universal" LSP plugins where you don't need to provide a client-side plugin to support a language. However, these editors might also have fallback support for textmate highlighting that is then possibly going to be outdated, and without a per-client plugin it might be difficult to tell the editor to not use that at all unless it's in the protocol. So if I design my LSP server-side to highlight everything exhaustively, it'd be nice to be able to explicitly state "please don't use textmate additively unless the end user overrides this, the result will likely be less correct anyway".

ghost commented 4 years ago

Another thing, does anyone know if this viewport-style local querying as suggested here is possible as a client? https://github.com/microsoft/vscode/issues/86415#issuecomment-573934661 It looks like it is with the ranges, but I just wanted to ask again since this looks really important to get ok performance in large files. Sorry if this was already answered, there were a lot of comments on this;

(Less important: I wonder, has anyone tested the performance of not using semantic token edits and just requerying the viewport on typing? I find the suggestion of not adding in the edits to simplify the protocol at least possibly worth exploring, even though I assume in the end one probably wants the edits for minimal typing lag)

rcjsuen commented 4 years ago

I have created a proposed.semanticTokens.md gist to hopefully make the reading of the TypeScript definition file a little bit easier.

I hope this will be of use to the wider community though I do kind of worry about possibly causing confusion due to it not being official. Hopefully the disclaimer at the top is good but I am happy to delete the Gist if people think it will do more harm than good (by possibly hurting the understanding of the semantic tokens API requests because it is not a 1-to-1 mapping of the TypeScript file).

rcjsuen commented 4 years ago

Another thing, does anyone know if this viewport-style local querying as suggested here is possible as a client? microsoft/vscode#86415 (comment) It looks like it is with the ranges, but I just wanted to ask again since this looks really important to get ok performance in large files. Sorry if this was already answered, there were a lot of comments on this;

@etc0de The two comments here (https://github.com/microsoft/vscode/issues/92789#issuecomment-608145650 and https://github.com/microsoft/vscode/issues/92789#issuecomment-608146554) seem to suggest it is working in terms of the VS Code API but I'm not sure if anyone's implemented textDocument/semanticTokens/range from the LSP end yet.

puremourning commented 4 years ago

Thanks for dong this @rcjsuen, it's much appreciated.

I would suggest that the overwhelmingly most important section is the referenced code comments: https://github.com/microsoft/vscode-extension-samples/blob/5ae1f7787122812dcc84e37427ca90af5ee09f14/semantic-tokens-sample/vscode.proposed.d.ts#L71 and https://github.com/microsoft/vscode-extension-samples/blob/5ae1f7787122812dcc84e37427ca90af5ee09f14/semantic-tokens-sample/vscode.proposed.d.ts#L131

This is the API that server and client authors need to understand, review and accept/debate. It's complex and arguably non-obvious and making it work with all the various clients is the goal (assuming of course the LSP stated goal of solving the matrix problem is still a goal).

For example, it wasn't obvious to me why delta encoding is used for the positions? This seems to introduce a (unnecessary?) complexity to the protocol which favours one particular implementation of client highlighting. Presumably its purpose is to minimise the amount of data transferred on the "wire" doing edits? If this leads to complex encoding/decoding, is it really the right trade-off ? I mean, are there numbers behind it?

axelson commented 4 years ago

For that I would like to request that textmate supports isn't a requirement in a client: my suggestion would be to make the spec say "if the client supports other sources of highlight formatting options like textmate, it should add them in additively unless the LSP server specifies option XYZ to indicate an intended exclusive handling of semantic highlighting" or something like that.

The reason for textmate being optional: I pondered writing a small mini-IDE for my language, and if I do I plan to just support LSP-backed highlighting only because that is all I personally need but I'd want to at least possibly allow other LSPs to be plugged in too. It'd be nice if I then wouldn't necessarily be violating the specs just because of a more spartanic not-textmate-supporting editor implementation.

That's a great point! I'm agreed on both points (not being textmate specific) as well as a method to indicate that the server supports fully tokenizing the document. Although in the second case, I think the client should be free to provide their own overrides of the server's tokens, a primary reason for this would be if an end-user wants to customize the tokenization in some way.

puremourning commented 4 years ago

Forgive my ignorance, but where does TextMate appear in the spec ? I realise that the spec uses TextMate snippet syntax (sort of) but reading @rcjsuen markdown, I don't see any TextMate specifics in there. Perhaps I'm missing something obvious.

axelson commented 4 years ago

I don't think TextMate appears in the spec, I was the only one that brought it up.

puremourning commented 4 years ago

Oh does VSCode use TextMate internally then? Is that why it came up ?

rcjsuen commented 4 years ago

Oh does VSCode use TextMate internally then? Is that why it came up ?

Correct, VS Code will render with TextMate first and then apply semantic highlighting (additively) on top after it gets the response back from the language server.

TextMate is not mentioned in the VS Code API or the LSP API as far as I know.

rcjsuen commented 4 years ago

I've tried to summarize the above points regarding what the client and server should support and declare.

The specification should not make any claims about the client needing to support TextMate, Tree-sitter, or any other grammar. LSP clients should be free to rely completely on the language server for its syntax highlighting needs without also needing to support syntax highlighting via a grammar.
Clients should declare a) if they support additive/merging support of semantic tokens on top of whatever internal grammar it has already and b) if they support discarding the grammar from A and replacing it completely with the semantic tokens information from the server.
- A implies that it is capable of taking the results from its internal grammar (if it exists) and merging them with the tokens from the language server. (Note: VS Code does this automatically right now with no opt-in/opt-out option.)
- B implies that it is capable of ignoring the grammar and simply applying everything from the language server instead. This would address https://github.com/microsoft/vscode-languageserver-node/issues/570. (Note: As indicated by the bug, VS Code does not support this use case right now.)
Servers should declare a) whether they support full document calculations and/or b) partial document calculations.
- A implies that the server knows everything and will return everything if asked. Anything it does not return is intentional (and likely done to fix grammar limitations which may make an additive end result less correct).
- B implies the server is capable of not returning everything if asked and thus the result it returns to the client should (probably?) be merged with the grammar on the client, assuming one has been defined and set there.

It's a little verbose but I think it's better to be more explicit about everything. What does everyone else think?

ghost commented 4 years ago

Clients should declare a) if they support additive/merging support of semantic tokens on top of whatever internal grammar it has already and b) if they support discarding the grammar from A and replacing it completely with the semantic tokens information from the server.

IMHO 2B should just be mandatory for clients that support the new semantic highlighting protocol. (2A can be optional, of course.) I don't see why it would be difficult for anyone to implement, and not having it can obviously mess up the result as seen in https://github.com/microsoft/vscode-languageserver-node/issues/570 .

Servers should declare a) whether they support full document calculations and/or b) partial document calculations.

Why not just replace 3. with a server->client command to just not do any additive grammars on top unless the user explicitly overrode it? I can't think of a scenario where actually knowing why is particularly relevant, especially since I can't really see why it'd be done outside of the expectation additive grammars worsen the result anyway, in which case it makes no sense to default to additively apply them for whatever reason.

Edit: basically the protocol of what is actually sent could be, client: "(if client supports & does this) btw I will use an additive grammars on top of LSP semantic highlighting, protest if that is bad", server: "(if additive is bad) please don't do that additive thing unless the user really made you do it, thanks". (Whether clients actually allow setting such an override I find personally unimportant. I don't think it is likely to be particularly needed, but I don't want to take away any IDE's choice to offer it to the user)

ghost commented 4 years ago

Sorry for the comment spam, but this remark bubbled up in my head again:

For example, it wasn't obvious to me why delta encoding is used for the positions? This seems to introduce a (unnecessary?) complexity to the protocol which favours one particular implementation of client highlighting

I'm probably missing something, but I agree and I had this other idea:

If these edits are to move things around fast after insert, can't there be a simple token-insert-at operation (and token-delete-at) for the LSP server which will shift all the follow-up tokens in index respectively in the client's token stream memory? For larger changes resulting (e.g. quotation typed with multiline string tokens) things would need to be fully retransmitted anyway, and the file position of the tokens could just be adjusted by the editor/client locally. (After all, it's obvious inserting a space will move every token after it on that line in terms of column position by +1, right?)

~~Or am I missing something? I'm really sorry if that proposal was already discussed.~~ I read the typescript wrong, obviously it is already an insert-at and/or delete-at. But why the complicated decoding then?

rcjsuen commented 4 years ago

For example, it wasn't obvious to me why delta encoding is used for the positions? This seems to introduce a (unnecessary?) complexity to the protocol which favours one particular implementation of client highlighting. Presumably its purpose is to minimise the amount of data transferred on the "wire" doing edits? If this leads to complex encoding/decoding, is it really the right trade-off ? I mean, are there numbers behind it?

@alexdima @dbaeumer Can one of you help weigh in here?

Servers should declare a) whether they support full document calculations and/or b) partial document calculations.

Why not just replace 3. with a server->client command to just not do any additive grammars on top unless the user explicitly overrode it?

I don't think a command makes sense as it feels odd to me for a server to send an explicit command to the client solely for the purpose of toggling something on and off. I am not sure how likely it is for a server to have confidence it is "perfect" for some X% of the time but then to also occasionally need to send a request/notification over to the client to toggle itself because now it is the Y% of the time when it's not "perfect".

How about enhancing the SemanticTokens interface so that it has a boolean field to instruct the client whether an additive merge should be applied or not? The exact naming of this field is of course up to debate.

export interface SemanticTokens {
    /* Copy/pasted from the original interface... */
    resultId?: String;
    /* Copy/pasted from the original interface... */
    data: number[];
    /*
     * The semantic tokens data should be applied on top of the
     * syntax highlighting that the client already has.
     */
    mergeRequired: boolean;
}

ghost commented 4 years ago

A field seems fine, sure. mergeRequired however sounds like a client needs to support merging which I think so far everyone agreed should be optional. What about mergeDisabled, or isExhaustive (= true means must not merge), or unmergeable, or something like that?

microsoft / language-server-protocol

Support semantic highlighting #18