Semantic highlighting API draft should incorporate "importKeyword" and "modifierKeyword" token types

ghost commented 4 years ago

https://github.com/microsoft/vscode-languageserver-node/blob/e5d7ad881b1a51b486e3f0e4aa0fbc25dad2be58/protocol/src/protocol.semanticTokens.proposed.ts#L18 < this appears to be the basis for the latest semantic highlighting draft, and I was asked I make a separate ticket for this:

I think some import/include/require token is super common (and while it might be a function in some languages it's a special dedicated keyword in many others) and therefore I suggest an "import" keyword is added to the default built-in list of the spec. Whether themes use it I don't really mind, but I just suggest it should be available for sending through the LSP server
the modifier list IMHO should include an "other modifier" type. For any remotely unusual language there's bound to be modifiers not covered by this list, and it would be still very useful info for a language client's themeing to know it's a modifier (rather than whatever generic keyword)

dbaeumer commented 4 years ago

The semantic highlight protocol is built in a way that different clients can have different tokens and modifiers even some which might not be standardize in the protocol since it will be very hard to cover all tokens and all modifiers across all languages.

@aeschli any comments on the specific request for import/include/require. Would make sense to me. The other modifier is IMO covered by the fact that a client can announce its own set of modifiers.

aeschli commented 4 years ago

Yes, an import modifier makes sense. If the client doesn't know of a token type/modifier or shows no interest in it, then the server should not report it.

ghost commented 4 years ago

If the client doesn't know of a token type/modifier or shows no interest in it, then the server should not report it.

... which means it won't be colored at all, right? That means any more slightly unusual languages will always have a notable amount of modifiers completely uncolored (like, not even clearly set apart as someting modifier-like) without a vendor-specific editor plugin which IMHO goes entirely against the point of LSP and semantic coloring, the point that was to not require an editor-specific plugin to make things work right for the most part. I understand VS Code usually has an editor-specific plugin, but that should not be the expected model since it leads to the entire old O(n^2) plugins problem which LSP was designed to avoid.

Is this really such an unusual view of mine? Please, add a generic modifier name. Forcing LSP servers to not report unknown modifiers at all, why would that be what the user wants?

Edit: I mean, unless the expectation is most themes won't be expected to want to color modifiers differently. But given how prominent they are in the spec the assumption appears to be most would want to do that

aeschli commented 4 years ago

Oh, I see, you are talking about a token type for modifier keywords? For example the public in this snippet:

*public* void foo() {
}

We currently just have a keyword token type for all keywords.

The modifiers in the list are intended to be annotations ('modifiers') to declarations and references.

class A {
static void foo() {
  return A.foo();
}
}

So the first foo has classification member.declaration.static and the second one member.static.

So, yes, having a modifier tokenType makes sense.

dbaeumer commented 4 years ago

Agree, that having a modifier token type makes sense. Will add it.

dbaeumer commented 4 years ago

Done in the library. Missing in the spec.

ghost commented 4 years ago

@aeschli sorry, I suppose my suggestion was unclear. This is what I had in mind:

export enum SemanticTokenTypes {
    comment = 'comment',
    keyword = 'keyword',
    string = 'string',
    number = 'number',
    regexp = 'regexp',
    operator = 'operator',
    namespace = 'namespace',
    type = 'type',
    struct = 'struct',
    class = 'class',
    interface = 'interface',
    enum = 'enum',
    typeParameter = 'typeParameter',
    function = 'function',
    member = 'member',
    property = 'property',
    macro = 'macro',
    variable = 'variable',
    parameter = 'parameter',
    label = 'label',
        import = 'import'  /* NEW !! */
}

and:

export enum SemanticTokenModifiers {
    documentation = 'documentation',
    declaration = 'declaration',
    definition = 'definition',
    static = 'static',
    abstract = 'abstract',
    deprecated = 'deprecated',
    readonly = 'readonly',
        other = 'other'   /* NEW */
}

Edit: although I'm thinking now, I assume modifier can be applied to any pre-existing token on top? In that case maybe it doesn't make too much sense, and @dbaeumer 's idea of a "modifier" SemanticTokenTypes entry for tokens that are only there to store modifiers makes more sense. Well, not sure, actually, I guess both would work...?

aeschli commented 4 years ago

Yes, there's the misunderstanding. Each semantic token is annotated by a SemanticTokenTypes and any number of SemanticTokenModifiers.

Not to be confused by the keyword syntax token that describes a modifier in the source code.

For keywords we have a single keyword SemanticTokenTypes.

Now, as discussed, we could add a new token type modifierKeyword. Before doing that, we should see if that's a popular request. You can pioneer this by just defining such token types on your side.

In VSCode we have a mechanism where extension can declare new token types based on a super type. They can also define mappings for theming. See https://github.com/microsoft/vscode/issues/96712#issuecomment-623454115 for an example.

What this means for LSP it would be part of a language server documentation to describe the token types it uses (standard and non-standard).

ghost commented 4 years ago

Now, as discussed, we could add a new token type modifierKeyword.

That sounds reasonable, so if LSP servers can't pick a modifier the editor knows at least the base type will reflect for the themeing it's a modifier. I like that solution.

You can pioneer this by just defining such token types on your side.

I don't see the point unless a theme actually implements that on the editor side, there is no useful super type to derive from that would actually cause a theme change. (I'm also currently simply not far enough in my LSP implementation yet)

Also, I'm also in part saying this after seeing your response in https://github.com/microsoft/vscode/issues/96712 but I don't think you should delay obviously vital base types like generic modifiers or doc comments too much through having people "pioneer" it to see if it's a "popular request", because that might lead to fragmentation. I think it's pretty obvious these two types will be widely needed, so if you just let people play around with it in local experiments for too long while everyone else forgets to adopt them it'll take forever for the ecosystem to catch up. That doesn't sound like a good approach to me. But that's just my personal opinion, obviously.

rcjsuen commented 4 years ago

See microsoft/vscode#97063.

DanTup commented 4 years ago

Not sure if this is the appropriate place for this - but I've just recently been adding CompletionItemTags and DiagnosticTags to our LSP server and noticed that things like deprecated are also supported on semantic tokens (which is good - we had a question about indicating deprecated code to editors when the lints were turned off).

However, the implementation looks quite different. Is there a reason not to model the modifiers in the same way as those tags for consistency?

dbaeumer commented 4 years ago

I am not sure what you are asking for. You mean having the same value set?

DanTup commented 3 years ago

I somehow missed the comment above - and I have no idea what I was asking for :-)

However, I came here with another question... Specifically about the class keyword. After switching to semantic tokens I noticed many of my keywords changed colour. I tried to use TypeScript as a reference, and noticed that it doesn't seem to produce a semantic token for the class keyword. I'm currently using the keyword type. This is different to what we had before (it was blue, now it's pink - using the Dark+ theme) and different to TypeScript (which still has a blue "class" keyword even with semantic tokens).

So my question is - what type/modifiers should a "class" keyword have, and does there need to be something new? Excluding them from semantic tokens feels weird, but it's also weird to have a mis-match in colours between the textMate grammar and the semantic tokens because it looks quite jarring. I'd like the semantic tokens to just improve the colouring for things that are hard to do in regex, not change all the colours.

aeschli commented 3 years ago

@DanTup IMO the only reason to emit keyword token types is when you have a language that so context sensitive and needs resolving to know what a modifier or keyword is. For the languages for which we have implemented a semantic highlight provider (TypeScript, HTML, Java), this wasn't the case and keywords were left to the syntax highlighter (TextMate grammar, in our case).

So admittedly, keyword is too coarse, and if you depend on it, you want to define subtypes such as modifierKeyword. It's ok to add custom token types. Once you have good insight please let us know on https://github.com/microsoft/vscode/issues/97063/ and we can look into adding them to our standard token types.

ell1e commented 3 years ago

@aeschli I think this ignores the interesting use case where an IDE may not have a basic syntax highlighter, and instead just plugged in some random LSP server with no additional knowledge. (Which I know VS Code doesn't really offer to do, but other IDEs like e.g. KDE's kate do. And I wonder if maybe VS Code should offer this since the whole writing-a-plugin-for-every-IDE-again approach just seems a bit dated to me but that's for another matter.) At least for that use case it would be useful if this was changed such that useful keyword info is always returned for the common LSP backends and those subtypes were specced out in the main specs, unless that would be too performance intensive. But I assume if it is already resolved to a keyword at all, it usually isn't much extra work to also return what keyword type it is. Just a nudge in case anyone else sees this use case as useful.

DanTup commented 3 years ago

For the languages for which we have implemented a semantic highlight provider (TypeScript, HTML, Java), this wasn't the case and keywords were left to the syntax highlighter (TextMate grammar, in our case).

Perhaps I misunderstood the goal of the semantic tokens in LSP. I'm also laying this on top of the textmate grammar in VS Code, however I assumed that it should be reasonable to use only semantic tokens (for a non-VS Code, LSP client) and still get a good highlight experience (this means not skipping tokens).

Now I'm not sure if that's right - are all LSP clients also expected to use something like a textmate grammar, or should semantic tokens provide a reasonable experience on their own? (I appreciate there are some other issues using only semantic tokens, like a delay in opening a file while the server warms up, but those seem like decisions for the client).

So admittedly, keyword is too coarse, and if you depend on it, you want to define subtypes such as modifierKeyword. It's ok to add custom token types.

I do already have some custom types (annotation and boolean) so adding new ones is fine, but it's not clear to me how clients would know how to colour these. For example if I add a "classKeyword" type (although I'm not sure if that's the better way or it would be better as a modifier, for example textmate seems to use things like "keyword" and "keyword.control"?), how would clients (that aren't hard-coded with my custom token type) know what it is? It might end up completely uncoloured.

It seems like the semantic tokens might not have enough options to do everything the textmate grammar can do, but I'd rather hoped that it would be superior to the grammar and that would just be a fallback.

DanTup commented 3 years ago

Looking more, I think the issue here may that VS Code is mapping the "keyword" semantic token onto "keyword.control" theming. The underlying theme (dark_plus) has a "keyword" (blue) and "keyword.control" (pink). Semantic Tokens only has "keyword" and is mapping that to "keyword.control" (so everything is pink).

Screenshot 2020-12-17 at 12 15 19

So maybe the best fix is that LSP has a control modifier that can be used on keyword and VS Code should then map those onto the respective theme scopes?

Edit: Seems like this was raised before at https://github.com/microsoft/vscode/issues/94367, but I don't entirely understand the reasoning. Seems like there may be a workaround for me there though.

DanTup commented 3 years ago

I've come up with a workaround for my issue, though I don't know if helps the original request here.

I've updated my server to add a control modifier to the keyword tokens that are for control. I've then overridden the semanticTokenScopes in the VS Code extension to point keyword back to keyword (VS Code is defaulting to keyword.control) and then keyword.control to keyword.control:

// package.json

"semanticTokenScopes": [
    {
        "language": "dart",
        "scopes": {
            "keyword": [
                "keyword"
            ],
            "keyword.control": [
                "keyword.control"
            ]
        }
    }
],

This seems to work as required. I can colour my keywords and control keywords separate, like the textmate grammar does (removing the jarring change), and the semantic tokens are all good semantically and can be used/styled by other editors without any quirks (and without needing their own grammar).

If control becomes an official modifier (or type), I'll move over to it.

dbaeumer commented 3 years ago

@aeschli could you please follow up.

dbaeumer commented 2 years ago

@aeschli friendly ping.

aeschli commented 2 years ago

It's on the backlog...

microsoft / language-server-protocol

Semantic highlighting API draft should incorporate "importKeyword" and "modifierKeyword" token types #968