julia-vscode / LanguageServer.jl

An implementation of the Microsoft Language Server Protocol for the Julia language.
Other
362 stars 78 forks source link

Dev docs #734

Open non-Jedi opened 4 years ago

non-Jedi commented 4 years ago

I feel like I've failed to get a comprehensive view of how the whole LanguageServer.jl ecosystem works in just fixing up random bugs over the past few months. Would be very useful to have even a high-level description of the architecture. Below, I'll describe in general terms how I understand the package working, and maybe with help we can fill this out enough to put it in a github wiki page or similar. Italics are places where I have some doubt about my understanding.

Devdocs initial effort

When LanguageServer.jl starts, it immediately starts a SymbolServer in a separate process. This SymbolServer indexes all the symbols in the current project's dependencies and makes those available to the LanguageServerInstance for completion/hover/docstrings/etc. This is done by having the SymbolServer process load each of the dependencies and do... something... CSTParser.jl is not used by the SymbolServer.

Within the current project, instead of loading the current project's files into a process, they are parsed into a concrete syntax-tree by CSTParser.jl. This CST is used by StaticLint to provide linting and internally by LanguageServer.jl to provide the rest of its rich functionality. Ideally there should be a paragraph here describing how LanguageServer.jl implements the semantics of Julia: methods all being part of the same function, inferring types of symbols, etc.

The CST produced by CSTParser is a tree of EXPR. Each EXPR can have arbitrary data attached to it via its meta property. This is where StaticLint stores its linting results.

Formatting of source files is provided by DocumentFormat.jl. The language server doesn't reuse its CST for formatting but instead hands DocumentFormat.jl a string representing the file being formatted which DocumentFormat parses into a CST before making formatting changes.

Additional questions that should be answered by this doc

Probably a lot more I haven't thought of, but if we could fill out the above paragraphs and answer the questions, I feel like that would give contributors a much more solid of a foundation for contributing.

davidanthoff commented 4 years ago

Yes, that would all be very useful! I think the docs should probably be located in https://github.com/julia-vscode/docs, just so that we try to centralize things.

ZacLN commented 4 years ago

This is done by having the SymbolServer process load each of the dependencies and do... something...

The call to SymbolServer.getstore launches a process which is passes the current environment's project file. It then tries to load existing on-disc caches for the dependencies of the project. If any of these either don't exist or are invalid in some way it loads those package to the process, caches them and stores them to disc (along with any dependencies).

Back on the main LanguageServer process, when this child process completes, we load the manifest file for the environment and the caches for all packages listed (i.e. installed packages in the environment + all of their dependencies). Caches for a package will not be available if the child process could not load the package (either directly or because the package of which it is a dependency could not be loaded). Unloaded caches are replaced with dummy ModuleStores.

This is done async while the main LanguageServer loop runs.

Ideally there should be a paragraph here describing how LanguageServer.jl implements the semantics of Julia: methods all being part of the same function, inferring types of symbols, etc.

I'll need to split and follow up on this one

Is the CST mutated in place when the user edits a file, or is it simply reparsed?

Top-level blocks are mutated following an edit. Parsing isn't particularly costly in the grand scheme of things but, as much of the other semantic information we're creating (bindings, scopes, links between symbols and bindings all of which is attached to EXPR) can be retained, we want reuse as much as we can.

This incredibly messy and outdated PR implements mutation at the level of terminal tokens. That PR only implements token level mutation for edits that either have no impact on the semantic representation (for example, the addition or deletion of white space in a non-whitespace sensitive region of an expression) or for which the effect can be determined to be sufficiently localised.

How are results from the SymbolServer and from LanguageServer.jl's parsing of current package integrated?

SymbolServer caches are used to represent packages that are external to the current workspace and so imports within user code will look through the package caches that have been loaded. These are then either loaded into a StaticLint.Scope's used_modules field (from here we look across the names that the packages export when resolving a symbol) or explicit imports can wrap them in a binding (e.g. import CSTParser: parse will attach a binding to parse pointing to the relevant FunctionStore). There are various complications, for example around the overloading of imported functions.

How are features provided for files that are included into another file where some of the symbols they user are defined?

These are represented as CST.

What if multiple files include the same file; does completion include symbols from both parents?

This isn't handled in a satisfactory. From the perspective of the file that has been included multiple times it has only one parent, the last file in the tree that included it.

How? Does the LanguageServerInstance maintain some sort of graph structure for this?

Yes-ish. To update the semantic information of a file we traverse across the CST from some 'root' file, building scopes, adding bindings to those scopes and associating symbols with those bindings. When we hit an include call (and when we know what that path points to) we load the CST for the included document setting the 'root' and continue traversing (side point - we maintain the same Scope across these files).

The root of a file can only be set at the point of being included by another file and all files within a tree share the same root.

Is semantic data (all symbols in current module, docstrings, etc.) saved separately from the CST, or does the server walk the CST again each time a request is made?

Everything is attached to CST. We walk across the tree to find which expression is being pointed at but that's an insignificant cost. Once we the expression all the information we need is right there.

non-Jedi commented 4 years ago

Thanks for responding in detail @ZacLN. I'll try to consolidate this into a PR to https://github.com/julia-vscode/docs