microsoft / language-server-protocol

Defines a common protocol for language servers.
https://microsoft.github.io/language-server-protocol/
Creative Commons Attribution 4.0 International
11.24k stars 798 forks source link

Standardize how to navigate to dependencies that are not readily available on the client filesystem #1595

Closed dannyfreeman closed 1 year ago

dannyfreeman commented 1 year ago

I have been working on one of the Emacs lsp clients and have run into an issue with various language servers for JVM languages. Most of them attempt to implement responses for the textDocuemnt/definition method and others like it, where the definition is not a plain source file that is readily available on the file system. Instead the definition is either in a source code file in .jar archive managed by a dependency tool like maven, or the "source" is a bytecode file contained in a .jar that must be extracted and decompiled for the user to make sense of it. (Java bytecode files are also referred to as .class files)

The three JVM language servers I've investigated are

Each of them take a different approach to handling these types of dependencies, some of which require non-standard LSP extensions to use.

I will outline them here:

Metals

The Scala language server takes an approach that requires NO special client implementation. When a user attempts to navigate to a definition in a scala source file contained in a jar, the language server extract the source file into a temp directory contained within the project. This temp source file is then provided to client via the Location response as a simple file:// URL. Something similar is done with definitions residing in .class files in jar archives: The class file is extracted and decompiled into a temporary directory under the projects, and the Location response contains a file:// URL pointing to the temp file.

Clojure-lsp

Clojure-lsp does has 2 different behaviors.

I will describe them below, but more information can be found in this issue of the clojure-lsp repo: https://github.com/clojure-lsp/clojure-lsp/issues/1385

Clojure .clj source files

If the dependency is a Clojure source file contained in a .jar archive, then clojure-lsp will return a Location response that contains either a jar URL (spec), or a zipfile URL (no official specification, but looks like zipfile:path/to/archive.jar::path/in/jar/to/source.clj). The type of URL is controlled by a setting in the LSP server.

When this happens, a client can do one of two things. They can either open the URL themselves if the clients have such capabilities. This involves extract the file from the jar archive (which is really just a zipfile with a different extension). Another option is the client can send a non standard request to the clojure-lsp server: clojure/dependencyContents. The method responds with the contents of the file, which the client can then display however it likes (perhaps a temporary buffer, saving to a file).

Either way, this strategy requires some special knowledge on the client side and is unique to clojure-lsp.

This strategy does not extend to the other JVM language servers, even though it could if they used the standard jar URI format and all clients chose to open them themselves instead of using nonstandard methods like clojure/dependencyContents.

Compiled java .class bytecode files

Clojure-lsp deals with these files the same way as the metals lsp server. It extracts and decompiles .class files when serving up responses to textDocuemnt/definition into a temp file under the current project. This tempfile is sent back in as file:// URL in the Location response.

jdt.ls

The java language server doesn't typically deal with plain source files in jars like Scala and Clojure language servers do. Instead there is one strategy it takes for these types of dependencies:

When a definition exists in a .class bytecode file within a jar archive, jdt.ls either returns an empty response, or it returns a URL of a bespoke format if a certain setting is enabled in the lsp server.

The bespoke URLs look like this:

jdt://contents/java.base/java.lang/String.class?=json-example_96bcdf0c/\\/usr\\/lib\\/jvm\\/java-17-openjdk\\/lib\\/jrt-fs.jar`java.base=/javadoc_location=/https:\\/\\/docs.oracle.com\\/en\\/java\\/javase\\/17\\/docs\\/api\\/=/<java.lang(String.class

They are not standard like the jar: scheme URLs that clojure-lsp uses, and are not really meant to be parsed by clients. This one also seems partially parsed, I've seen others in the wild that have a LOT more URL escaping. Instead it acts more like a token that must be passed back to the server using the non-standard java/classFileContents method for jdtls. The method responds with the contents of the buffer, very similar to clojure/dependencyContents. Forcing the client to defer to the server to get the source is intentional, as there are MANY ways of decompiling a .class file, so the server must to be the source of truth if it is going to provide location information in the resulting decompiled file. If the clients tried to decompile it on their own then it is likely the location would not match up.

More information about this can be found in this issue of jdt.ls: https://github.com/eclipse/eclipse.jdt.ls/issues/2322.

Looking for a final solution

I'm writing this issue to the spec because as someone who contributes to a lsp client, I would like to see one standardized way of handling these types of Location responses, which I would summarize as:

The definition of some thing is located somewhere that is not readily available on the client's filesystem, and some extra action must be taken to make it available to the client.

What the best solution is for this? I do not know. The way I see things there are a couple strategies that could be taken:

  1. Upon receiving a textDocuemnt/definition (and friends) request, the lsp server automatically "gets" the source files and extracts it to a temp file, and responds with a file: URL.
    • "gets" could mean, extracting from an archive, decompiling, maybe fetching from the internet. I don't think the spec needs to change for this to happen, but getting some kind of official "blessing" to point server maintainers to would be very helpful.
  2. A new response type is added to the textDocuemnt/definition response Instead of a Location response with a URI key, some new response would tell the LSP client what server method to call, which would respond with:
    1. the "file" contents of the dependency, such that the client may decide how to present them to the client (maybe a temp buffer, or to save to disk, it's up to the client)
    2. A true Location response pointing to a temp file
      • This might allow the clients to have the user confirm whether or not they want to do the extra work to get this dependency.
      • The new response may include a prompt to show the user. "Do you want to decompile dependency X?" or "Do you want to download dependnecy X?"
  3. A new response type is added to the textDocument/definition response to ALWAYS return the contents of the dependency, with no extra round trip like suggestion number 2.

I'm sure there are other solutions for this, and things I am not considering (like how it would work on language servers running on a remote machine, something I have little experience with). Putting something in the standard would eliminate a class of custom behavior I have observed among various JVM language servers and make the out of the box experience better without having to setup custom client code for every single language server that needs something like this.

I also have no doubt it would help other language servers as well outside the JVM ecosystem. I can imagine that language servers supporting CLR languages might also be able to take advantage of this when dealing with libraries distributed as DLLs (at least they were distributed as DLLs when I worked with C# years ago).

puremourning commented 1 year ago

FWIW DAP handles this with a standard approach: a "Source" type and /source request. Clients are told that a source is not a "file on the filesystem" but a "source reference". Client does a "/source" request with the source reference and server returns the contents.

https://microsoft.github.io/debug-adapter-protocol/specification#Requests_Source https://microsoft.github.io/debug-adapter-protocol/specification#Types_Source

dannyfreeman commented 1 year ago

Ben Jackson @.***> writes:

FWIW DAP handles this with a standard approach: a "Source" type and /source request. Clients are told that a source is not a "file on the filesystem" but a "source reference". Client does a "/source" request with the source reference and server returns the contents.

https://microsoft.github.io/debug-adapter-protocol/specification#Requests_Source https://microsoft.github.io/debug-adapter-protocol/specification#Types_Source

Yeah, this seems a lot like my proposal number 2.i.

Something to consider with that is some users would want things like textDocument/definition to continue working in the newly opened source file. If the server sends back some kind of unique identifier for the source, I imagine it would be able to keep track of it and continue to offer those capabilities. I know clojure-lsp does this when files are opened as a jar uri, and that jar uri continues to be provided in TextDocumentIdentifier requests.

-- Danny Freeman

dannyfreeman commented 1 year ago

Digging through the open issues, this might be a duplicate of https://github.com/microsoft/language-server-protocol/issues/336

dbaeumer commented 1 year ago

@dannyfreeman you are correct, is a dup of #336. The idea is that servers can register a content provider for a URL scheme and then the client has to call the method to fetch the content. It is comparable to what DAP does.