microsoft / language-server-protocol

Defines a common protocol for language servers.
https://microsoft.github.io/language-server-protocol/
Creative Commons Attribution 4.0 International
11.08k stars 775 forks source link

Encode SymbolKind and SymbolTag as strings #1186

Open astoff opened 3 years ago

astoff commented 3 years ago

I would like to propose changing the SymbolKind and SymbolTag enum values to strings, so SymbolKind.Function = "function" instead of 3, and so on.

Most languages probably have one or two features that don't quite fit in the existing collection of SymbolKinds (e.g., patterns in Haskell, so this change would allow servers to better handle these edge cases.

On the client side, the spec already says that clients should “handle values outside [the SymbolKind] set gracefully and falls back to a default value when unknown”. Therefore, servers are already allowed to send strings as SymbolKind. I'm simply suggesting to encourage this practice when necessary, and also to make the existing API more consistent with it.

Sending strings as the SymbolKind in edge cases is a reasonable thing to do because, while there isn't anything particularly graceful the client could do with an unknown value of 99, there is definitely something useful it could do with a value of "pattern", or "pragma" (e.g., show that word in a menu).

This is not to say that there shouldn't be an official set of values for SymbolKind, and of course most servers will stick to the official set for better compatibility. Issue https://github.com/microsoft/language-server-protocol/issues/344 remains relevant, as it would be task of the LSP specification to collect the most useful SymbolKinds and make them official.

My interest in this, concretely, concerns the TeX markup languages. In this case, the relevant SymbolKinds would be things like

and there's nothing reasonable to choose from, even in the way of analogy, from the current options. Now, mine is a very particular case, and I have no hope of convincing you to spec out those things; I myself would be against it if I was in charge of LSP. So I'd rather send the above SymbolKind values as strings to the editor, and let it do its best with that. Surely the editor can create a nice document outline without knowing what the semantics of "chapter" are.

For SymbolTags, it's probably even more desirable to allow nonstandard values to address the idiosyncrasies of particular languages.

dbaeumer commented 3 years ago

This is potentially breaking. To keep it backwards compatible we would need to add a client capability to signal that.

Showing string enum values in the UI is problematic when it comes to i18n.

astoff commented 2 years ago

This is potentially breaking. To keep it backwards compatible we would need to add a client capability to signal that.

The specification already says that unrecognized values should be handled gracefully:

                         * The symbol kind values the client supports. When this
             * property exists the client also guarantees that it will
             * handle values outside its set gracefully and falls back
             * to a default value when unknown.

Showing string enum values in the UI is problematic when it comes to i18n.

It still seems easier (or at most equally hard) to translate "function" -> "função" than 12 -> "função".

mitonize commented 1 year ago

@astoff I totally agree with you.

@dbaeumer Who should be able to define language specific SymbolTags?

dbaeumer commented 1 year ago

We usually collect them in issues like: https://github.com/microsoft/language-server-protocol/issues/344

astoff commented 1 year ago

Has any of the suggestion made in #344 ever made it to the spec? The first ideas mentioned there, pragma and macro, have not. Same for the section kind that I was looking for — it was first mentioned in January, 2019.

kraigher commented 1 year ago

I am working on a language server for the VHDL language. VHDL is a hardware description language that has a lot of elements that do not fit into the pre-defined symbol kinds.

I think it is a futile effort to try to globally define all kind of symbols in the LSP-spec, there are just so many types of languages in the world. For example this issue mentions LaTeX stuff like chapter and cross-reference. For VHDL there is architecture, entity, configuration, context and many more. The set of symbol kinds is probably large and it is not feasible for all clients to know about this global list and do anything useful with it. The current architecture pushes non-mainstream languages into poor UX or to having special clients tailored to the server which defeats the purpose of the LSP.

Thus I really like this proposal to support server-defined symbol kinds via a string keyword. I think it is a much better architecture. The client can then show the symbol kind keyword as a prefix of the symbol. It is probably good to keep a short list "meta kinds" as part of the LSP protocol which would map to icons in the clients. The current list of hardcoded symbol kinds could probably be kept as such "meta kinds". The list of "meta kinds" would be global and smaller. The server defined symbols would inherit from an existing symbol kind but add their own string keyword.

Having symbol kinds as strings would have improved the UX of my VHDL language server since I could differentiate between entity, component, 'architectureand other constructs in the language. This would also allow supporting a well definedworkspace/symbolfiltering based onSymbolKind` as well related to https://github.com/microsoft/language-server-protocol/issues/1674 and https://github.com/microsoft/language-server-protocol/issues/1382.

@dbaeumer I do not think i18n is relevant objection to having named symbol kinds. Names or keywords denoting a language construct is probably expect knowledge for someone working in a programming language and has to be typed in english in the file by the user as a keyword anyway in many cases. Also the LSP The server will produce diagnostics and other text messages in english anyway. The protocol supports sending a locale from the client to the server that I assume would cause the server to change the language of its diagnostics. This could also be used to translate the named SymbolKinds as well if the server deems it useful.

Proposal

Add a new server defined NamedSymbolKind:

interface NamedSymbolKind {
   // The base kind of the symbol, used to select an icon in clients 
   kind: SymbolKind,
   // Keyword to show as a prefix of the symbol
   name: string,
}

// Published by the server. 
// The index is used to compactly encode the symbol in future messages.
// Several named symbols may use the same SymolKind
type NamedSymbolKinds = NamedSymbolKind[];

The idea is that the kind serves as a meta kind to select an icon. Several NamedSymbols can use the same symbol kind but with different names.

In VHDL there are several things which could be considered a MODULE and in LaTeX several things might be a NAMESPACE. Those can all use the same icon but be disambiguation by the name prefix in the UI.

pidgeon777 commented 1 year ago

@kraigher To me, this seems to be a great idea. Hopefully, Microsoft Team will consider your proposal 👍.

dbaeumer commented 1 year ago

Instead of adding an additional type I would rather do something comparable to semantic tokens. That would be:

kraigher commented 1 year ago

@dbaeumer I think the key attributes of a solution is:

  1. There needs to be a small set of "meta kinds" that are defined by the protocol essentially mapping to generic icons that represent concepts such as "Type", "Object", "Namespace" similar as today.
  2. The server can "instantiate" the meta kinds with a name denoting a language construct or keyword. Clients would render "icon + keyword" as prefix of the symbol. The important aspect is that one meta kind can be instantiated several times with different names.

If these two key attributes are fulfilled the clients are happy since they can have a small global list of icons for concepts and servers are happy since they can create a specific set of tags that match 1:1 with the constructs of the programming language at hand.

astoff commented 1 year ago

and which index they use to refer to it

In my opinion this is doubling down on the original mistake of referring to symbol kinds by numbers. The proposal is to allow strings.

Today, the server can already send any random numeric value of SymbolKind when nothing fits. The client is already required to handle this in whatever way it's able to. The proposal is to allow passing a string value, because there's much more the client could do to handle things gracefully in that case.

I think it's nice to have a way for the server to describe extra details about a symbol type, or maybe even a hierarchy of symbol kinds. I don't have any opinion about the details, as long as this step is optional.

kraigher commented 1 year ago

In my opinion this is doubling down on the original mistake of referring to symbol kinds by numbers. The proposal is to allow strings.

The numbers makes sense as a data compression mechanism so that the server does not have to send many strings when answering the workspace symbol or document symbol requests.

In my proposal the server would define the legal kinds during initialization such as:

For LaTex

0: NAMESPACE, "chapter"
1: NAMESPACE, "section"
...

For VHDL

0: MODULE, "entity"
1: MODULE, "architecture"
....

When answering the document and workspace symbol requests it would use the indexes as a data compression mechanism.

I do agree however that when the server publishes the declaration of the symbols during the initialization there is no need for data compression and the current SymbolKinds could be referred to as strings.

dbaeumer commented 1 year ago

I would leave the current kinds and tags as is. There is IMO no need to refer to them during initialization or do refer to them as strings. All the client needs to do is letting the server know what the next free index is that the server can use to define its own.

kraigher commented 1 year ago

@dbaeumer The reason I write about referring to the SymbolKinds when instantiating server defined kinds is I see the need for a global set of concepts that can map to icons in the client. Otherwise all of the server defined kinds would map to the same icon or have no-icon.

Maybe the existing set of SymbolKinds is not the most minimal set of such global meta icons. An indication of this is that several SymbolKinds map to identical icons today in VSCode . So maybe the best long term option would be to start anew with a smaller list of "concept icons" for this feature. There needs to be an icon for something that is a pluggable interface, for something that is callable, for something that contains array data, for something that is a hierarchical node, etc...

The client would show the symbol with <icon> <keyword> <name> and maybe clients would start to offer document symbol filtering also on the keyword part.

astoff commented 1 year ago

All the client needs to do is letting the server know what the next free index is that the server can use to define its own.

In this case it would suffice to declare in the spec that all indices above 99 (or 999, or whatever) are free for server-specific kinds. In fact this is already possible today; the only thing missing is a way for the server to communicate to the client the meaning of those additional items.

maybe clients would start to offer document symbol filtering also on the keyword part.

This is a cool feature where server-specific symbol kinds would be important.

Maybe the mapping of kinds -> icons should be a global mapping defined by the editor. It would fall back to a default if the kind is unknown to the editor. Actually, this would be an incentive for servers to try as much as possible to stick to a well-established collection of kinds, and deviate only when really necessary (as is your case and mine).

kraigher commented 1 year ago

@astoff I do not really know what the server is supposed to do with the client declared kinds. It seems backwards to me. The kinds represent programming language constructs and it is the server which knows about those. How should an arbitrary server handle that a client declares a custom kind? It seems the only use cases for that is for a client/server pair that has been tailored for each other and then the custom kind might as well be defined by the server.

astoff commented 1 year ago

I do not really know what the server is supposed to do with the client declared kinds. It seems backwards to me.

I agree it is. I'm just saying that the server doesn't necessarily should have to specify which icon to assign to each kind (this is a UI/client-specific thing, after all). The client should know a list of commonly used kinds. There will be more than the dozen options available now, but surely not thousands of them.

But I have nothing against the idea of introducing a hierarchy of kinds. It seems a nice idea.

dbaeumer commented 1 year ago

Given the fact that most of the LSP client (e.g. the editors) do use a different UI language will make it very hard for servers to provide icons since they will very likely be tailored for one client and then look off in others. So even if we let servers define symbol kind the icon need to come from somewhere else where it is tailored to the client. In addition VS Code has icon themes. If a server would provide them they would need to honor the theme as well.

kraigher commented 1 year ago

@dbaeumer I am not suggesting that the server picks the icon design. I am suggesting the protocol should offer a fixed set of universal kinds that will map to icons by the client. For example if the server specifies a PERSON kind one client might render a cartoonish human but another client just the outline of a person.

Ideally the kinds should represent universal concepts such as MODULE or INTERFACE.

The key is that the server can choose to use the same icon kind for several named symbols that are disambiguated by the name to allow it to create symbols that map 1:1 with all types of symbols of the language at hand.

kraigher commented 1 year ago

Just taking the existing SymbolKinds and allowing the server to instantiate each one of them 0-N times with a name keyword would be a good approximation of my ideal solution. Clients would use whatever icon they use for the existing SymbolKinds today but add the keyword between the icon and the symbol name when rendering document symbols.

astoff commented 1 year ago

I think the icons example is confusing because it's too specific. If I understand correctly, @kraigher's suggestion boils down to grouping symbol kinds into categories so that the client can handle server-specific symbol kinds in a richer way.

However, instead of grouping the kinds into categories or "icons", I think it makes more sense to specify a hierarchy. So the server could advertise new symbol kinds by providing a name, number and a parent.

For instance, in LaTeX, I could declare the kind chapter with index 102 and parent namespace. It's up to the client to do whatever it wants with that info, including but not limited to choosing an icon for display.

dbaeumer commented 1 year ago

@kraigher Actually I misunderstood the proposal since most of the time if users ask for a new SymbolKind is to get a specific icon they think the UI should render.

In your proposal you would still use the icon set that is currently available without extending it (although I think language owners might then push for a different icon as well).

For which client would we prototype / add such a support? I looked at VS Code and there is currently no way to decouple kinds from icons and meaning (e.g. VS Code can for example file on kinds) so at least at the beginning the LSP client library for VS Code would disable that feature.

pidgeon777 commented 1 year ago

Neovim has great LSP support up to the latest standard specifications I think, maybe it could be selected as a client for testing?

https://neovim.io/doc/user/lsp.html

https://github.com/neovim/neovim/blob/master/runtime/lua/vim/lsp.lua

https://github.com/neovim/neovim/tree/master/runtime/lua/vim/lsp

https://github.com/neovim/neovim/blob/master/runtime/lua/vim/lsp/protocol.lua

kraigher commented 1 year ago

@dbaeumer I could write a formal proposal or RFC first? Should I do that in a new issue?

dbaeumer commented 1 year ago

I usually do it the other way around. Start with a prototype and then see how it evolves. Based on that we then try to formalize it in the specification.

But I want to reiterate one point: the specification is since version 3.x made in a way that clients / servers are free to decide whether they want to implement something or not (e.g. we basically have versioning per feature through capabilities). There is no way LSP can force an implementation.

kraigher commented 1 year ago

@pidgeon777 were you willing to work on the client side of the prototype?

pidgeon777 commented 1 year ago

I can help to test an LSP server with the Neovim client and its current LSP protocol implementation (which I think is aligned to the latest LSP standard version), but not with the LUA development of the Neovim client itself, unfortunately.

yyny commented 4 months ago

Unlike Tokens, Symbols shouldn't be sent that often, so the overhead of sending a string instead of an integer should be minimal.

I think the simplest solution would be for the client to give a string[] of client-supported kinds, and for the server to return a mapping from server-kinds to the client-kinds. The server would then always return server-kinds. Reason being that using the server-kinds makes updating both clients and servers easier.

If we insists on mapping symbol kinds to integers, then they should be chosen by the server, since the server knows which kinds will be most common (there are only 10 one digit numbers, after all).