Closed hackwaly closed 3 years ago
Fully agree and we already discussed it since we would like to have it in VS Code as well.
I'd love to have this for the PowerShell extension also. We provide the ability to create "dynamic keywords" for the purpose of writing domain-specific languages. Semantic highlighting would allow us to colorize those keywords in VS Code even though they aren't part of the PowerShell language spec.
/cc @BrucePay
I was looking for that as well.
What's exactly the meaning of the current 'highlighting' for read/write/text (see document highlights)?
I implemented that part but didn't see any for of visual feedback in the editor.
DocumentHighlight is for "mark occurrences"
@cdietrich sorry for hijacking the topic, but this is very unclear to me. How is document highlights used to realize a 'mark occurrences" feature? How does the read/write/text distinction fit in, and why does it only expect a single result in return? For mark occurrences, I'd expect to be able to return a collection, no?
Would be great if that could be clarified also in protocol.md
.
@smarr yes you are right. this makes no sense. i asume the return type should be an array. if i have a look at vscode i can find
export interface DocumentHighlightProvider {
provideDocumentHighlights(model: editor.IReadOnlyModel, position: Position, token: CancellationToken): DocumentHighlight[] | Thenable<DocumentHighlight[]>;
}
thus looks like a bug in the protocol
@cdietrich thanks, I'll open a separate issue.
Please find a proposal in PR #124
I’m guessing this is dead... anyone got any updates?
I am working on a new proposal for semantic coloring.
There is a proposal in the microsoft/vscode-languageserver-node repository now that I suggest others take a look at when they get a chance.
@dbaeumer Suggest to close #513 in favour of this or the other way around.
VSCode 1.43 now supports semantic highlighting via the language server for TypeScript: https://code.visualstudio.com/updates/v1_43#_typescript-semantic-highlighting
I think it's through proposed updates for the language server protocol 3.16?
Can someone provide a link to the proposed 3.16 LSP spec, which includes "semantic highlighting"? I cannot find it in the usual location. I've read the wiki page on Semantic Highlighting Overview for VSCode, and while I am pleased with the "General Tokens Provider API" because it fits nicely into the current VSIDE (aka VS2019) ITagger<> and IClassificationFormatMap framework, I am also concerned over the Token Classification portion. I remind folks that symbols in programming languages like Antlr, Bison, and other parser generator languages, do not fit into these categories at all. Also, it is worth noting that grammars can contain target-language code blocks, so it is not a homogenous language used. I am hoping that the provider just return an integer of an equivalence class that the client then maps into the client UI format via a user specified mapping. The client does not need to know the meaning of the equivalence class of the symbol, i.e., whether it is "variable", "type", "enum", "non terminal" or "terminal". However, the user must understand what it is. For Antlr LSP, I already provide a judicious mapping of the equivalence class of the symbol into a SymbolKind--an integer--for Document Symbols, disregarding the meaning of the SymbolKind (e.g., "type"), and use that for CFG/static semantics colorization via a custom message. (Note, in VSIDE, SymbolKind in the LSP client that MS implemented also takes on additional services, like "go to def" and "find all refs", so it was done with care, experimentally, so things like comments do not get populated in drop-down boxes.) In VSIDE, scrolling only requests tagging for a small range of text that is going to be displayed. It is quite fast and works well. I never implemented a TextMate color tagger because Antlr syntax cannot be described using regular expressions.
Can someone provide a link to the proposed 3.16 LSP spec, which includes "semantic highlighting"?
I don't think it's been written up yet. There is this TypeScript file from the microsoft/vscode-languageserver-node repository that you can do some reading up on.
Interesting, isn't it in practical use already? Would be useful to allow other editor/IDEs and LSPs to catch up if that is the case, preferably without everyone just guessing / making up the specs based on the code by themselves.
It's in use by some extensions, but it's still in proposal as far as I know. The lsp implementation will even needs to get some updates if you can trust https://github.com/microsoft/vscode/issues/95168 So it's not done it seems.
That is correct. The API is not final yet hence it is still in proposed state (and will be there until we ship 3.16)
Ok. But forgive me if this is a stupid question, but wouldn't it be better to get some early feedback on how it could look like before investing all the effort into a complete implementation in VS Code? It sounds like now would be a good time to collect some feedback while the implementation isn't fully set in stone yet
@etc0de Some of us have implemented it and there was a bit of back and forth on the VS Code API (https://github.com/microsoft/vscode/issues/86415) which does influence the LSP API. I didn’t have much issues implementing this feature but Dockerfiles aren’t exactly the most complicated language semantics-wise. :)
Of course, I do agree in that it would be great if there were more implementations so there can be more feedback about the feature.
Well the thing is, all you linked is implementation code but I don't even know how the actual protocol looks like like right now. I just found this in one of your links, but this is about vs code settings: https://github.com/microsoft/vscode/wiki/Semantic-Highlighting-Overview
Hasn't anybody sat down and done a quick writing down of the actual JSON things sent back and forth?
I feel like this is what feedback would be useful on right now, because I'm not talking end user feedback (this is the LSP repo after all, not the VS code repo) but other language server authors' feedback whether the protocol even works for them and their implementations. This seems to be unclear right now, most of the links floating around here look like they're already for end users to test or complete plugins for use as-is. I don't care about the VS code API, I want to know if the actual underlying protocol is reasonable from the language server side.
It just feels like VS Code is already locking in their implementation when not many have really seen the underlying protocol yet, that seems risky to me.
Also, as a side note, has anybody considered deprecating documenthighlight and renaming it to expressionhighlight, occurrencehighlight, or something different in a newer protocol draft? The "document" part really makes it sound like it semantically highlights the entire document, so I feel like the naming here is somewhat unfortunate. Edit: I guess it's named like that for consistency with DocumentSymbol etc. Then what about DocumentOccurrenceHighlight or something, or is that too long? Oh well, maybe renaming it isn't ideal... it was just an idea
Well the thing is, all you linked is implementation code but I don't even know how the actual protocol looks like like right now.
@etc0de The TypeScript file I linked to is mostly interfaces (and not necessarily implementation code) but it is certainly fair to say that it is not exactly very reader-friendly.
Hasn't anybody sat down and done a quick writing down of the actual JSON things sent back and forth?
In terms of the payload of the integer array, that is explained in the API documentation for the provideDocumentSemanticTokens
function in VS Code which was linked to from the aforementioned TypeScript file. But as per the above, I agree that it is certainly not super obvious and it would be nice if it was presented in a more readable format.
I feel like this is what feedback would be useful on right now, because I'm not talking end user feedback (this is the LSP repo after all, not the VS code repo) but other language server authors' feedback whether the protocol even works for them and their implementations. This seems to be unclear right now, most of the links floating around here look like they're already for end users to test or complete plugins for use as-is. I don't care about the VS code API, I want to know if the actual underlying protocol is reasonable from the language server side.
Then I suggest you take a look at the comments from the last few months in https://github.com/microsoft/vscode/issues/86415. Comments like https://github.com/microsoft/vscode/issues/86415#issuecomment-573934661, https://github.com/microsoft/vscode/issues/86415#issuecomment-587184889, and https://github.com/microsoft/vscode/issues/86415#issuecomment-596143316 are all questions posed by from the perspective of a language server author. Although the questions were posed in the VS Code repository, because the proposed LSP API of semantic highlighting practically maps 1-to-1 to the VS Code API the questions are still of relevance.
Do I kind fo wish those questions were linked to from or posted in this issue instead? Sure.
It just feels like VS Code is already locking in their implementation when not many have really seen the underlying protocol yet, that seems risky to me.
I can see that.
Also, as a side note, has anybody considered deprecating documenthighlight and renaming it to expressionhighlight, occurrencehighlight, or something different in a newer protocol draft? The "document" part really makes it sound like it semantically highlights the entire document, so I feel like the naming here is somewhat unfortunate. Edit: I guess it's named like that for consistency with DocumentSymbol etc. Then what about DocumentOccurrenceHighlight or something, or is that too long? Oh well, maybe renaming it isn't ideal... it was just an idea
This is probably the first I've heard of such a request. I have never thought of it that way but from how you are describing it I can see how it can be confusing now that I look at things that way.
The TypeScript file I linked to is mostly interfaces (and not necessarily implementation code)
I figured, but I honestly can't tell to what JSON they map. What does a SemanticTokensBuilder with a push do? What is a provideDocumentSemanticTokensEdits?
entry in JSON? Is that a type, a name of some complex object member, a boolean, ...? It is honestly quite unreadable if you're not familiar with typescript.
IMHO it's a bad precedent to just assume anyone using this standard is familiar with typescript, I think it's also bad to drop an implementation as a spec draft. But maybe that's just me :woman_shrugging: I wish I could give you feedback, I'm certainly interested, but in the current format I'm afraid that is hard to do
Then I suggest you take a look at the comments from the last few months
The discussion is interesting, but it doesn't really solve that the "draft" itself in it's current form isn't really in a universally readable form. It's basically a closed off playground for typescript devs only, apparently...? I don't know, I do kind of find this approach bewildering. I really would suggest this approach is changed.
Edit: to be fair, the protocol.semanticTokens.proposed.ts
is a bit more readable now that I looked over it. It was mostly the semantic-tokens-sample/vscode.proposed.d.t
that threw me off. Still, I think using typescript code for a spec draft is far from ideal.
The discussion is interesting, but it doesn't really solve that the "draft" itself in it's current form isn't really in a universally readable form. It's basically a closed off playground for typescript devs only, apparently...? I don't know, I do kind of find this approach bewildering. I really would suggest this approach is changed.
The other proposed API at the moment, the textDocument/callHierarchy
request does at least have a Markdown file written up about it and I can recall other instances in the past where a similar Markdown file was written. I am not sure why one wasn't written this time.
For the sake of completeness, the F# plugin Ionide also already implements it and also shipped it. @Krzysztof-Cieslak
We created custom LSP endpoint and wrapped stuff on VSCode side, it’s not really using proposed API in LSP spec
I have a question that I haven't seen an answer to. Can the semantic highlighting be additive with the existing textmate grammars? Or is it completely one or the other?
Can the semantic highlighting be additive with the existing textmate grammars? Or is it completely one or the other?
I don’t believe this has been spec’d out. I encountered this in VS Code (https://github.com/microsoft/vscode-languageserver-node/issues/570 ) and they do a merge of sorts there.
Semantic highlighting in VSCode is additive. VSCode first runs normal text mate highlighting and then improves it when semantic results are available.
But I’d imagine it may differ from client to client, it seems like a client implementation detail.
I implemented the LSP portion of semantic highlighting for the rust-analyzer server and it was pretty straightforward.
The other proposed API at the moment, the
textDocument/callHierarchy
request does at least have a Markdown file written up about it and I can recall other instances in the past where a similar Markdown file was written. I am not sure why one wasn't written this time.
FYI that file does not match the protocol. I was hoping to update it if https://github.com/microsoft/vscode-languageserver-node/pull/614 looks right.
I think @etc0de has brought up some very good points and I too would like to see a draft of the spec in a similar form to what would eventually appear on the website.
I think @etc0de has brought up some very good points and I too would like to see a draft of the spec in a similar form to what would eventually appear on the website.
The other proposed API at the moment, the
textDocument/callHierarchy
request does at least have a Markdown file written up about it and I can recall other instances in the past where a similar Markdown file was written. I am not sure why one wasn't written this time.FYI that file does not match the protocol. I was hoping to update it if microsoft/vscode-languageserver-node#614 looks right.
:( Well, that's unfortunate. Thanks for pointing this out, @kjeremy!
@rcjsuen @Krzysztof-Cieslak thanks for the info and link. It would be great to see the additive (or not) behavior formally spec'd out (perhaps in a new .md
that has been proposed). As a Language Server author I like the idea of semantic highlighting being additive since it will make it easier to adopt since initially I could just add support for the few tokens that have recursive definitions (or are otherwise hard to define without using negative lookbehinds or other complex tools).
FYI that file does not match the protocol. I was hoping to update it if microsoft/vscode-languageserver-node#614 looks right.
:( Well, that's unfortunate. Thanks for pointing this out, @kjeremy!
I have pointed this out a while ago :)
@rcjsuen @Krzysztof-Cieslak thanks for the info and link. It would be great to see the additive (or not) behavior formally spec'd out (perhaps in a new
.md
that has been proposed).
I'm all in favour of more things being explicitly stated. 👍
For what it's worth, I left minor other points here: https://github.com/microsoft/vscode/issues/86415#issuecomment-619350479
It would be great to see the additive (or not) behavior formally spec'd out
For that I would like to request that textmate supports isn't a requirement in a client: my suggestion would be to make the spec say "if the client supports other sources of highlight formatting options like textmate, it should add them in additively unless the LSP server specifies option XYZ to indicate an intended exclusive handling of semantic highlighting" or something like that.
The reason for textmate being optional: I pondered writing a small mini-IDE for my language, and if I do I plan to just support LSP-backed highlighting only because that is all I personally need but I'd want to at least possibly allow other LSPs to be plugged in too. It'd be nice if I then wouldn't necessarily be violating the specs just because of a more spartanic not-textmate-supporting editor implementation.
The reason for LSP server being able to override textmate with a protocol option: I am all for editors that have "universal" LSP plugins where you don't need to provide a client-side plugin to support a language. However, these editors might also have fallback support for textmate highlighting that is then possibly going to be outdated, and without a per-client plugin it might be difficult to tell the editor to not use that at all unless it's in the protocol. So if I design my LSP server-side to highlight everything exhaustively, it'd be nice to be able to explicitly state "please don't use textmate additively unless the end user overrides this, the result will likely be less correct anyway".
Another thing, does anyone know if this viewport-style local querying as suggested here is possible as a client? https://github.com/microsoft/vscode/issues/86415#issuecomment-573934661 It looks like it is with the ranges, but I just wanted to ask again since this looks really important to get ok performance in large files. Sorry if this was already answered, there were a lot of comments on this;
(Less important: I wonder, has anyone tested the performance of not using semantic token edits and just requerying the viewport on typing? I find the suggestion of not adding in the edits to simplify the protocol at least possibly worth exploring, even though I assume in the end one probably wants the edits for minimal typing lag)
I have created a proposed.semanticTokens.md
gist to hopefully make the reading of the TypeScript definition file a little bit easier.
I hope this will be of use to the wider community though I do kind of worry about possibly causing confusion due to it not being official. Hopefully the disclaimer at the top is good but I am happy to delete the Gist if people think it will do more harm than good (by possibly hurting the understanding of the semantic tokens API requests because it is not a 1-to-1 mapping of the TypeScript file).
Another thing, does anyone know if this viewport-style local querying as suggested here is possible as a client? microsoft/vscode#86415 (comment) It looks like it is with the ranges, but I just wanted to ask again since this looks really important to get ok performance in large files. Sorry if this was already answered, there were a lot of comments on this;
@etc0de The two comments here (https://github.com/microsoft/vscode/issues/92789#issuecomment-608145650 and https://github.com/microsoft/vscode/issues/92789#issuecomment-608146554) seem to suggest it is working in terms of the VS Code API but I'm not sure if anyone's implemented textDocument/semanticTokens/range
from the LSP end yet.
Thanks for dong this @rcjsuen, it's much appreciated.
I would suggest that the overwhelmingly most important section is the referenced code comments: https://github.com/microsoft/vscode-extension-samples/blob/5ae1f7787122812dcc84e37427ca90af5ee09f14/semantic-tokens-sample/vscode.proposed.d.ts#L71 and https://github.com/microsoft/vscode-extension-samples/blob/5ae1f7787122812dcc84e37427ca90af5ee09f14/semantic-tokens-sample/vscode.proposed.d.ts#L131
This is the API that server and client authors need to understand, review and accept/debate. It's complex and arguably non-obvious and making it work with all the various clients is the goal (assuming of course the LSP stated goal of solving the matrix problem is still a goal).
For example, it wasn't obvious to me why delta encoding is used for the positions? This seems to introduce a (unnecessary?) complexity to the protocol which favours one particular implementation of client highlighting. Presumably its purpose is to minimise the amount of data transferred on the "wire" doing edits? If this leads to complex encoding/decoding, is it really the right trade-off ? I mean, are there numbers behind it?
For that I would like to request that textmate supports isn't a requirement in a client: my suggestion would be to make the spec say "if the client supports other sources of highlight formatting options like textmate, it should add them in additively unless the LSP server specifies option XYZ to indicate an intended exclusive handling of semantic highlighting" or something like that.
The reason for textmate being optional: I pondered writing a small mini-IDE for my language, and if I do I plan to just support LSP-backed highlighting only because that is all I personally need but I'd want to at least possibly allow other LSPs to be plugged in too. It'd be nice if I then wouldn't necessarily be violating the specs just because of a more spartanic not-textmate-supporting editor implementation.
That's a great point! I'm agreed on both points (not being textmate specific) as well as a method to indicate that the server supports fully tokenizing the document. Although in the second case, I think the client should be free to provide their own overrides of the server's tokens, a primary reason for this would be if an end-user wants to customize the tokenization in some way.
Forgive my ignorance, but where does TextMate appear in the spec ? I realise that the spec uses TextMate snippet syntax (sort of) but reading @rcjsuen markdown, I don't see any TextMate specifics in there. Perhaps I'm missing something obvious.
I don't think TextMate appears in the spec, I was the only one that brought it up.
Oh does VSCode use TextMate internally then? Is that why it came up ?
Oh does VSCode use TextMate internally then? Is that why it came up ?
Correct, VS Code will render with TextMate first and then apply semantic highlighting (additively) on top after it gets the response back from the language server.
TextMate is not mentioned in the VS Code API or the LSP API as far as I know.
I've tried to summarize the above points regarding what the client and server should support and declare.
The specification should not make any claims about the client needing to support TextMate, Tree-sitter, or any other grammar. LSP clients should be free to rely completely on the language server for its syntax highlighting needs without also needing to support syntax highlighting via a grammar.
Clients should declare a) if they support additive/merging support of semantic tokens on top of whatever internal grammar it has already and b) if they support discarding the grammar from A and replacing it completely with the semantic tokens information from the server.
Servers should declare a) whether they support full document calculations and/or b) partial document calculations.
It's a little verbose but I think it's better to be more explicit about everything. What does everyone else think?
Clients should declare a) if they support additive/merging support of semantic tokens on top of whatever internal grammar it has already and b) if they support discarding the grammar from A and replacing it completely with the semantic tokens information from the server.
IMHO 2B should just be mandatory for clients that support the new semantic highlighting protocol. (2A can be optional, of course.) I don't see why it would be difficult for anyone to implement, and not having it can obviously mess up the result as seen in https://github.com/microsoft/vscode-languageserver-node/issues/570 .
Servers should declare a) whether they support full document calculations and/or b) partial document calculations.
Why not just replace 3. with a server->client command to just not do any additive grammars on top unless the user explicitly overrode it? I can't think of a scenario where actually knowing why is particularly relevant, especially since I can't really see why it'd be done outside of the expectation additive grammars worsen the result anyway, in which case it makes no sense to default to additively apply them for whatever reason.
Edit: basically the protocol of what is actually sent could be, client: "(if client supports & does this) btw I will use an additive grammars on top of LSP semantic highlighting, protest if that is bad", server: "(if additive is bad) please don't do that additive thing unless the user really made you do it, thanks". (Whether clients actually allow setting such an override I find personally unimportant. I don't think it is likely to be particularly needed, but I don't want to take away any IDE's choice to offer it to the user)
Sorry for the comment spam, but this remark bubbled up in my head again:
For example, it wasn't obvious to me why delta encoding is used for the positions? This seems to introduce a (unnecessary?) complexity to the protocol which favours one particular implementation of client highlighting
I'm probably missing something, but I agree and I had this other idea:
If these edits are to move things around fast after insert, can't there be a simple token-insert-at operation (and token-delete-at) for the LSP server which will shift all the follow-up tokens in index respectively in the client's token stream memory? For larger changes resulting (e.g. quotation typed with multiline string tokens) things would need to be fully retransmitted anyway, and the file position of the tokens could just be adjusted by the editor/client locally. (After all, it's obvious inserting a space will move every token after it on that line in terms of column position by +1, right?)
Or am I missing something? I'm really sorry if that proposal was already discussed. I read the typescript wrong, obviously it is already an insert-at and/or delete-at. But why the complicated decoding then?
For example, it wasn't obvious to me why delta encoding is used for the positions? This seems to introduce a (unnecessary?) complexity to the protocol which favours one particular implementation of client highlighting. Presumably its purpose is to minimise the amount of data transferred on the "wire" doing edits? If this leads to complex encoding/decoding, is it really the right trade-off ? I mean, are there numbers behind it?
@alexdima @dbaeumer Can one of you help weigh in here?
Servers should declare a) whether they support full document calculations and/or b) partial document calculations.
Why not just replace 3. with a server->client command to just not do any additive grammars on top unless the user explicitly overrode it?
I don't think a command makes sense as it feels odd to me for a server to send an explicit command to the client solely for the purpose of toggling something on and off. I am not sure how likely it is for a server to have confidence it is "perfect" for some X% of the time but then to also occasionally need to send a request/notification over to the client to toggle itself because now it is the Y% of the time when it's not "perfect".
How about enhancing the SemanticTokens
interface so that it has a boolean
field to instruct the client whether an additive merge should be applied or not? The exact naming of this field is of course up to debate.
export interface SemanticTokens {
/* Copy/pasted from the original interface... */
resultId?: String;
/* Copy/pasted from the original interface... */
data: number[];
/*
* The semantic tokens data should be applied on top of the
* syntax highlighting that the client already has.
*/
mergeRequired: boolean;
}
A field seems fine, sure. mergeRequired
however sounds like a client needs to support merging which I think so far everyone agreed should be optional. What about mergeDisabled
, or isExhaustive
(= true means must not merge), or unmergeable
, or something like that?
Like WebStorm and VS does: eg. Symbol is a type or parameter or namespace or unresolved ...
Textmate based grammars are hard to do this. Since we did support Diagnostics, why not support semantic highlighting?