microsoft / language-server-protocol

Defines a common protocol for language servers.
https://microsoft.github.io/language-server-protocol/
Creative Commons Attribution 4.0 International
11.21k stars 793 forks source link

Support that servers can provide file system implementations and read from remote file systems as well #1264

Open dbaeumer opened 3 years ago

dbaeumer commented 3 years ago

Currently servers are restricted to read files from their local file systems. LSP should offer ways that servers can:

dbaeumer commented 3 years ago

@NTaylorMullen

rwols commented 3 years ago

implement file systems

I don't understand this bullet point, can you clarify? Do you mean that a client can query the remote server's FS? Why would the client be interested in that?

read from and write to other remote file systems.

This makes sense... A client can provide a kind of "virtual FS" to the remote-running language server and the server can then query the VFS.

rwols commented 3 years ago

implement file systems

Ah, if the entire project is also remote, then it makes sense as a client would have to discover the directory layout somehow.

NTaylorMullen commented 3 years ago

@rwols if you're curious: https://github.com/NTaylorMullen/LSPVirtualDocuments/blob/master/Documents/FileSystemSpec.md

aslakhellesoy commented 2 years ago

Is anyone working actively on this? It would be amazing to have this feature in order to make my vscode extension work as a web extension (it relies on a language server that needs to read files).

XeroOl commented 2 years ago

I would love to see this. One possible use case is if your language server has a decompiler built in. The language server would be able to reference things in the decompiled version of the file to the editor without needing to place decompiled sources into the filesystem.

nelak2 commented 1 year ago

Just to make sure I understand this issue: Virtual file systems exist in VSCode's memory. Language servers run as a separate process so they can't read that memory and therefore can't access the virtual file system. The lsp protocol handles transferring the text of the current file being worked between the two so the language server will be able to process that just fine. It's the references to other files in the form of includes or other project references that will break because it can't resolve a virtual file path.

My question then is - does VS code send these uri's across to the language server at all or do they get filtered out? I'm working on an internal use only language server. What I'm wondering is if VS code will send across the full Uri is there anything stopping me from adding logic to handle the virtual file system in my language server as well? (beyond the duplication of work of course and logic needed to keep the language server and client in sync)

r3m0t commented 1 year ago

Yes that's right.

In this example, scheme: 'file' filters out virtual files, if you remove it then the LSP will receive the virtual files as well.

https://code.visualstudio.com/api/language-extensions/language-server-extension-guide

On Thu, 17 Nov 2022, 19:01 nelak2, @.***> wrote:

Just to make sure I understand this issue: Virtual file systems exist in VSCode's memory. Language servers run as a separate process so they can't read that memory and therefore can't access the virtual file system. The lsp protocol handles transferring the text of the current file being worked between the two so the language server will be able to process that just fine. It's the references to other files in the form of includes or other project references that will break because it can't resolve a virtual file path.

My question then is - does VS code send these uri's across to the language server at all or do they get filtered out? I'm working on an internal use only language server. What I'm wondering is if VS code will send across the full Uri is there anything stopping me from adding logic to handle the virtual file system in my language server as well? (beyond the duplication of work of course and logic needed to keep the language server and client in sync)

— Reply to this email directly, view it on GitHub https://github.com/microsoft/language-server-protocol/issues/1264#issuecomment-1319073789, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALKZLXLXGVK2CMZKRYUFVLWIZ6JXANCNFSM44RBU3RQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

dselman commented 1 year ago

Not sure if this is the right place to comment so please redirect if necessary.

My use case is I'm trying to port an existing node.js Language Server to a Web Extension. I've got the basics working thanks to the useful sample however the challenge is now how to implement global (cross file) consistency checks for my language (think import checking etc).

I can no longer use fs in the Language Server, as I'd like the functionality to work in a web extension and the client documentSelector only sends LSP events for documents that are opened in the editor. I tried using vscode.workspace.findFiles on the client side and sending the files to the LSP server via a custom message, but that doesn't work with vscode-test-web because it doesn't support search for its mount scheme.

Is it possible to implement cross-file consistency checks within a web extension, or will it have to operate in a degraded "single file" mode?

hugocaillard commented 1 year ago

@dselman I had the same issue while porting an LSP server (written in Rust) to the web through WASM. I handled it by creating a few request handlers on the client (in TypeScript) that can be requested by the server to simulate the FS. See: client code request from server

Feel free to DM me on twitter for more details

dbaeumer commented 1 year ago

@hugocaillard is the porting of the LSP Server to WASM available on Github. We are working on WASM support for VS Code and I would be interested in looking at what you did. Our implementation is here https://github.com/microsoft/vscode-wasm and an extension that executes Python in the Browser is here: https://github.com/microsoft/vscode-python-web-wasm

hugocaillard commented 1 year ago

@dbaeumer Yes, and we made a blog post about it: https://www.hiro.so/blog/write-clarity-smart-contracts-with-zero-installations-how-we-built-an-in-browser-language-server-using-wasm.

Everything is in this repo: https://github.com/hirosystems/clarinet In ./components/clarity-vscode -> the TypeScript parts In ./components/clarity-lsp> the Rust part

The LSP server wasn't build for web from the ground up and the project is still under active development, but it's running in production and used by many developers every days (marketplace)

dselman commented 1 year ago

@hugocaillard thanks to your code I now have something working. When the Language Server is initialised it requests that the client open all the .cto files, which triggers onDidChangeContent on the Language Server, allowing it to rebuild global state from all the .cto files in the workspace.

hugocaillard commented 1 year ago

@dselman Awesome, glad it helps! In the end you probably won't need to trigger false onDidChange, the server should be able to discover all the .cto files from the workspace URI or some other base location sent by the client. I don't want to spam this issue, but my Twitter DMs are open if you want to pursue this discussion (link in github profiles)

dbaeumer commented 1 year ago

@hugocaillard thanks for the pointers. Looks actually really cool.

Do you know if your RUST code compiles to WASM-WASI. If so, you could get rid of all your custom file system provider calls. What I implemented is a WASI host that maps the whole WASI API to the VS Code API. So you can right normal Rust, C/C++ code with normal file system operations and it will transparently be mapped to the VS Code file system API.

What we want to achieve is that someone can take a normal Rust, ... program compile it down to WASM_WASI and run it inside VS Code where the file system available in the WASM execution is VS Code's workspace file system (and more since the vscode-wasm implementation support arbitrary mount points)

dbaeumer commented 1 year ago

See https://github.com/microsoft/vscode-wasm/blob/a703168627ea8937829add349055a16640962227/testbeds/rust/src/main.rs#L1 and https://github.com/microsoft/vscode-wasm/blob/a703168627ea8937829add349055a16640962227/testbeds/cpp/hello.cpp#L1

DanTup commented 1 year ago

@dbaeumer

What I implemented is a WASI host that maps the whole WASI API to the VS Code API

I'm curious about this. Does this mean VS Code running in the browser would be able to work with a file:// filesystem (and not just some custom scheme like virtual workspaces use)? For example if a language server was compiled to WASM (with WASI - something I just learned about 30 seconds ago!) but only supported a "real" filesystem (eg. only file:/// URIs over LSP, and expects to be able to read the files from disk through it's native FS APIs that have been compiled to WASI calls), could this work?

If so, what is VS Code(-wasm) going to present as the file:// system when running in the browser? Is it going to map to a users actual file system (eg. something like https://web.dev/file-system-access/), or something virtual/in-browser file store that's not on the users disk?

Edit: Re-reading https://matt-rickard.com/wasi-vs-wasm, I'm now wondering if you were talking about the browser at all, or whether this is intended for local VS Code's, just using WASM instead of JS?

hugocaillard commented 1 year ago

@dbaeumer Thanks!

I'm now wondering if you were talking about the browser at all, or whether this is intended for local VS Code's, just using WASM instead of JS? – DanTup

I have the same confusion.

Do you mean that with the LSP server, running in the browser (vscode.dev or github.dev) could access VSCode FS APIs without any change?

Although my current setup rely on wasm-pack that compiles to wasm-unknown-unknown

dbaeumer commented 1 year ago

Yes, the idea is that a server can access the file system without any changes. What I do is the following:

Here is a demo with CPython compiled down to WASM-WASI (without any modifucations to the CPython code) running in a browser executing a Python file sitting in a GitHub repository, including importing another file

cast

DanTup commented 1 year ago

@dbaeumer this sounds great!

I'm looking at the code at https://github.com/microsoft/vscode-wasm/tree/a703168627ea8937829add349055a16640962227/wasm-wasi - do I understand correctly that the npm package here contains the VS Code WASI bindings, and that package wraps/hosts the CPython wasm binary in a way that provides it with the implementations for those file APIs? (eg. the compilation of CPython is just standard WASM/WASI and doesn't need anything VS Code-specific at compile time)?

dbaeumer commented 1 year ago

Yes, but you need sync-api-client and sync-api-service as well which implements the VS Code API in a sync way since WASI is sync :-).

You might want to look inside testbeds to see how it is put together.

DanTup commented 1 year ago

I did wonder how that would work, thanks for the pointers! I don't know how likely it is that the Dart server will ever compile to wasm, but it's good to know that if it does, it may not need as many changes as I'd thought to be able to handle some of these use cases. Thanks! 🙂

brettcannon commented 1 year ago

I'm now wondering if you were talking about the browser at all, or whether this is intended for local VS Code's, just using WASM instead of JS?

In case you didn't notice Dirk's demo was via vscode.dev, WASI works anywhere WebAssembly works, so both browser and Node in our case. Think of WASI as POSIX for WebAssembly; it's just a spec and WebAssembly runtimes implement that spec to let code do stuff in a secure, portable way like accessing files.

In other words, WASI works wherever VS Code works, desktop and web. 🙂

dkattan commented 1 year ago

What are the odds that we can get some of @NTaylorMullen 's suggestions into 3.18? Specifically reading directories/files

https://github.com/NTaylorMullen/LSPVirtualDocuments/blob/master/Documents/FileSystemSpec.md#readDirectory

dbaeumer commented 1 year ago

We would need someone who drives this in both the spec and an implementation.

d01010101 commented 1 year ago

Maybe I am wrong, but I am not convinced that things like this would scale well:

export interface ReadFileResponse {
    /**
     * The entire contents of the file `base64` encoded.
     */
    content: string;
}

That's ok for what language servers do now, but the mere client FS access suggests a more involved/complete source code analysis by the language server which in turn sounds a lot like what also a debugger needs (in-editor expression evaluation for example). Which may eventually make it interesting to somehow integrate LSP and debugging/runtime like the Debug Adapter Protocol. Which in turn might require a fully fledged parallel FS with locking, links, access rights and so on which is not necessarily trivial to implement. Instead, an IDE may configure a local FS server like SSH FS and the server may access it like any other local FS. Extending LSP to support file lock/update notifications might be useful in the case of operations like refactoring.

Then, I did not study the subject too much and maybe that shows.

dbaeumer commented 1 year ago

For simplicity reason we might want to think about starting with a read only access on the server. This might handle most use cases.

nelak2 commented 1 year ago

For simplicity reason we might want to think about starting with a read only access on the server. This might handle most use cases.

I think that makes the most sense. To me, a language server is intended to provide contextual data about a file to an editing tool, not be the editing tool itself.

d01010101 commented 1 year ago

For simplicity reason we might want to think about starting with a read only access on the server. This might handle most use cases.

How a custom read-only FS tied to LSP is more simple than for example LDAP (initialized or tunneled by LSP), when it is LDAP which already has a lot of libraries and tools virtually everywhere from client JS to server Java? Not an opinion, just a question.

d01010101 commented 1 year ago

On a second thought, a simple file read access as proposed by dbaeumer might still serve a lot of functions without, depending on the approach,

d01010101 commented 1 year ago

To me, a language server is intended to provide contextual data about a file to an editing tool, not be the editing tool itself.

So perhaps you'd find this interesting https://news.ycombinator.com/item?id=16875685. With such an approach, even a source-wide refactoring could be done with contextual data only and without any write access.

brettcannon commented 1 year ago

So perhaps you'd find this interesting https://news.ycombinator.com/item?id=16875685. With such an approach, even a source-wide refactoring could be done with contextual data only and without any write access.

Do note that the AST approach has its own drawbacks, e.g., you need to make sure that AST representation can represent every potential structure needed for every language VS Code supports (which is a lot since it's any and all languages 😉). Plus not every e.g. refactoring will work the same for each language, so it doesn't necessarily save you from having to either re-implement or do a ton of special-casing for various languages (once again, needs to work with any language out there).

nelak2 commented 1 year ago

To me, a language server is intended to provide contextual data about a file to an editing tool, not be the editing tool itself.

So perhaps you'd find this interesting https://news.ycombinator.com/item?id=16875685. With such an approach, even a source-wide refactoring could be done with contextual data only and without any write access.

The idea of just exposing an AST feels great in theory but I'm not sure it would work in practice. After all isn't that largely how editors handled different languages before the LSP?

Not to say that it's inherently the wrong approach or that approach wouldn't have worked if there was a standard protocol around it but in general that approach was explored for decades without a standard developing or without editors being full of special handling for different languages.

My conclusion after reading that discussion is that the world might need a language server framework to help language server developers build their AST which can be used by framework provided default implementations of common LSP features.

d01010101 commented 1 year ago

Yes, the "ton of special casing" can be a problem which may happen when a protocol tries to actually define in detail the AST layer, in order to put above it yet another layer of generic language-independent operations, which the link seems to suggest.

In order to avoid the problem in question, AST could be an otherwise undefined layer above the raw edited files. LSP would only provide a grammar-independent low-level API for a bidirectional translating service raw <-> AST and possibly its own example or default implementation of a "typical" LL(k) translating service. Such services could reside by default on the IDE side but the server could use any independent service on its side.

Examples of what could be done when a grammar with error recovery and whitespace handling is transferred to the said LL(k) service:

  1. A faulty statement is replaced by a "warning node" and the parser skips after the next configurable token, like ";". Then the service automatically underscores the faulty statement via the raw layer.
  2. Refactoring an identifier is as simple as asking the service to enumerate the referred node and its all references and then rename them. All relevant source updates automatically.
  3. Moving a class automatically moves a file in the directory structure because a file path is one of the said references, all in one grammar tree. If it's Java and you also need to update the imports, the server asks the service to enumerate all relevant import sections and modifies them respectively. Again, it is the service itself which updates the source. So each language indeed defines its own refactoring as it is now, but with the AST layer, it is much simpler.
  4. All whitespace retaining or generating is the task of the said service.
  5. Adding some basic model transform rules to the grammar would enable changing each fragment of an identifier (but not of a quoted string) of the form \alpha into α, without even engaging the server.

See that I do not have much experience with language protocols, so I can miss some detail here.

DanTup commented 11 months ago

Is anybody currently working on this? In particular, I'm interested in the server being able to provide file contents for some virtual files (and being able to include URIs for the scheme it uses in other requests - for example being able to Go-to-Definition from a real file:/// document on the client side to a foo://bar provided by the server).

I think @dbaeumer's comment above:

For simplicity reason we might want to think about starting with a read only access on the server. This might handle most use cases.

... probably covers what I need. If I understand correctly, I think VS Code also already has implementation for this (registerFileSystemProvider).

I may be interested in helping (with this narrower-scoped version), but I don't want to duplicate effort if anything is already in progress.

puremourning commented 11 months ago

Is anybody currently working on this? In particular, I'm interested in the server being able to provide file contents for some virtual files (and being able to include URIs for the scheme it uses in other requests - for example being able to Go-to-Definition from a real file:/// document on the client side to a foo://bar provided by the server).

I think @dbaeumer's comment above:

For simplicity reason we might want to think about starting with a read only access on the server. This might handle most use cases.

... probably covers what I need. If I understand correctly, I think VS Code also already has implementation for this (registerFileSystemProvider).

I may be interested in helping (with this narrower-scoped version), but I don't want to duplicate effort if anything is already in progress.

For prior art, jdt.ls implements this in a custom thing. In fact they used to send their jdt:// URIs to all clients and had a side channel to retrieve the contents. I'd welcome standardisation of a mechanism for this. Pinging @snjeza and @fbricon

mickaelistria commented 11 months ago

What JDT-LS implements is more or less what has already been long discussed in https://github.com/microsoft/language-server-protocol/issues/336 , a custom operation to attempt resolving of URI (of whichever scheme that client cannot process directly, in JDT-LS it's jdt:) to actual document content to display when attempting to open the document. In such case, the client needs to replace usual read from filesystem by a custom query to the language server. The topic of this particular issue if slightly different (and so far there is no obvious need for it in most LS I've used); https://github.com/microsoft/language-server-protocol/issues/336 is really where the "get document content" topic discussion (and maybe a PR) should continue.

dkattan commented 11 months ago

https://code.visualstudio.com/api/extension-guides/virtual-workspaces currently states

What about support in the Language Server Protocol (LSP) for accessing virtual resources? Work is under way that will add file system provider support to LSP. Tracked in Language Server Protocol issue #1264.

Which brings us here. What is going on??

dkattan commented 11 months ago

Perhaps the LSIF format will help:

The purpose of the Language Server Index Format (LSIF) is it to define a standard format for language servers or other programming tools to dump their knowledge about a workspace.

The Project Context looks like it will help enumerate files/folders

The Embedding Contents feature could be used for retrieving the contents of a given file.

It can be valuable to embed the contents of a document or project file into the dump as well. For example, if the content of the document is a virtual document generated from program meta data. The index format therefore supports an optional contents property on the document and project vertex. If used the content needs to be base64 encoded.

The explanation of the feature makes it sound like it is specifically for virtual files. If it also included something to the effect of "This can also be used to facilitate web-based editors that lack filesystem access"

@dbaeumer this appears to be your baby, thoughts?