Open Doekeb opened 3 months ago
Do you have some performance data? For example how long it takes to generate tokens in a 2000 lines document? It feels like it would be very slow to trigger "goto" for each "name" like that.
Ideally such feature would be implemented by jedi and use some form of caching to speed things up. The LSP semantic tokens is designed in a way that should make the case of adding/removing text pretty fast but in your implementation the whole work seems like will be done from scratch on every single change.
Do you have some performance data? For example how long it takes to generate tokens in a 2000 lines document? It feels like it would be very slow to trigger "goto" for each "name" like that.
I don't have performance data on a behemoth like that, but happy to gather some especially if you can point me in the direction of a big project I can try it on. Additionally, if performance ends up being an issue for huge files, it would be fairly simple to implement the range protocol which exists exactly for this purpose. From the LSP specs:
There are two uses cases where it can be beneficial to only compute semantic tokens for a visible range:
- for faster rendering of the tokens in the user interface when a user opens a file. In this use cases servers should also implement the textDocument/semanticTokens/full request as well to allow for flicker free scrolling and semantic coloring of a minimap.
- if computing semantic tokens for a full document is too expensive servers can only provide a range call. In this case the client might not render a minimap correctly or might even decide to not show any semantic tokens at all.
Determining when to request full semantic tokens vs. a range would then be the client's responsibility.
Ideally such feature would be implemented by jedi and use some form of caching to speed things up. The LSP semantic tokens is designed in a way that should make the case of adding/removing text pretty fast but in your implementation the whole work seems like will be done from scratch on every single change.
I agree that an upstream implementation is possible and preferable, and it would be great to contribute a portion of this to Jedi down the road. But hopefully this can work for the people who want it in the meantime.
If performance is a major concern (I agree that it would be good to gather more information on this front), we could begin by making this plugin opt-in like many of the other bundled plugins are.
I don't have performance data on a behemoth like that, but happy to gather some especially if you can point me in the direction of a big project I can try it on.
Not as big but maybe https://github.com/davidhalter/jedi/blob/master/jedi/plugins/stdlib.py
Additionally, if performance ends up being an issue for huge files, it would be fairly simple to implement the range protocol which exists exactly for this purpose. From the LSP specs:
Would it really be that easy? It depends really on whether the API that you are using for this would make it possible.
LSP supports Semantic Tokens which editors and colorschemes can opt into in order to provide "smarter" language highlighting than pure tree-based highlighting. https://code.visualstudio.com/api/language-extensions/semantic-highlight-guide https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/#textDocument_semanticTokens
Notably, Neovim now supports semantic tokens (https://github.com/neovim/neovim/pull/21100) and, more recently, semantic token modifiers (https://github.com/neovim/neovim/pull/22022).
This feature has been requested in this repo here: https://github.com/python-lsp/python-lsp-server/issues/33 In the unmaintained base here: https://github.com/palantir/python-language-server/issues/933 In another jedi-based language server here: https://github.com/pappasam/jedi-language-server/issues/137 And it's implementation has been attempted and abandoned twice in the latter: https://github.com/pappasam/jedi-language-server/pull/196 and https://github.com/pappasam/jedi-language-server/pull/231
There is a maintained fork of alternative tool for Neovim here https://github.com/wookayin/semshi, but it suffers from the major drawbacks that it is only available for Neovim and highlight colors are hardcoded, so they are unlikely to match the user's colorscheme.
This PR only implements full document protocol. Performance may be improved by also implementing full document delta protocol, and the range protocol.
Here are some examples in two different colorschemes with only very simple rules implemented so far. Tree-based highlighting is always on the left, and augmented with Semantic Token highlighting is always on the right.
Functions and classes
dingus_mc_bingus
as a class even though it is.my_function
andMyFunction
are both functions.Imports
Tree-based highlighting can't determine what kind of thing imported names are, other than by their naming (which often break convention even in standard library modules in python)
Parameters
self
as a special token. This is not language smarts as evidenced by the lack of highlighting of the language-equivalentthis
. Semantic tokens currently colors bothself
andthis
inside a method as a regular parameter, but this could be improved using semantic token modifiers and a bit more inference (the colorschemes I'm using here don't apply any different styles to modifiers). Note that even in the semantic token augmented version, tree-based highlighting takes over onself
when its outside a method.Properties
Tree-based highlighting guesses whether an attribute is a property or a method based on the presence of parentheses. Semantic token highlighting knows the difference.