Make SymbolServer.jl more library-like

thomasjm commented 1 year ago

This is an effort to make SymbolServer.jl more library-like, in the sense that significant logic is not hidden in script files like indexpackage.jl and server.jl. This logic is now exposed in two exported functions: SymbolServer.index_package and SymbolServer.index_packages, to make it easier for third party packages to use.

There are also some miscellaneous spelling and grammar fixes, as well as a few @time calls to help with profiling the indexing process. It's probably easiest to review this commit-by-commit.

For every PR, please check the following:

[X] End-user documentation check. If this PR requires end-user documentation in the Julia VS Code extension docs, please add that at https://github.com/julia-vscode/docs.
[X] Changelog mention. If this PR should be mentioned in the CHANGELOG for the Julia VS Code extension, please open a PR against https://github.com/julia-vscode/julia-vscode/blob/master/CHANGELOG.md with those changes.

mkitti commented 1 year ago

What is with the disc -> disk thing? Isn't this merely British versus American spelling?

thomasjm commented 1 year ago

No, I put a citation in the commit message:

Disk is indisputably the correct spelling for computer storage drives

https://grammarist.com/spelling/disc-disk/

This is also the case in Britain; the article provides a link to the Telegraph with an example.

davidanthoff commented 1 year ago

This will most likely not work, the reason for the current design is that the indexing process runs in the user environment where it does not have access to the SymbolServer package. So, moving all this code into the regular package will essentially make it unavailable to both the local indexing process as started by LS and the cloud indexing service that runs inside the docker container.

If we wanted to rearrange things here then I think we could move the various "application like" scripts into a separate folder just to make it clearer that these are not part of the module defined by the package, but rather are code that is meant to run in a special child process. And I guess we could expose another function from the regular package that launches a child process and indexes a given package.

But the core idea that the indexing should run in a separate process that does not have regular SymbolServer loaded seems really sound to me.

davidanthoff commented 1 year ago

Oh, and just as a side note: it is much easier to separate things like spelling changes out into separate PRs. PRs are much easier to handle for us if they each do one thing only. PRs that fix a whole variety of things that are unrelated are more difficult to review, and most likely will take longer to process.

thomasjm commented 1 year ago

I wouldn't normally open a whole PR just to be pedantic about spelling :P. I'm happy to split those commits off if you prefer. They would produce a merge conflict with the current changes though.

So, moving all this code into the regular package will essentially make it unavailable to both the local indexing process as started by LS and the cloud indexing service that runs inside the docker container.

Hmm, I'll have to think about this a bit more. Why is it important that SymbolServer not be available in the indexing process? I notice that server.jl currently imports almost all of the same code anyway. If it could be made available, users of SymbolServer wouldn't have to do weird stuff like this.

Maybe we could just move the indexing logic into separate .jl files, which can then be included by both the main SymbolServer module and the application scripts?

On the general idea of launching things in separate processes: I don't love it, because due to Julia's precompilation, every additional process tends to lead to significant additional latency and memory usage. Also, the fact the SymbolServer has to communicate with the main LanguageServer over a socket using an ad-hoc protocol adds complexity and a bit of a barrier to parallelizing the indexing. I understand the desire to re-nice the server process as done here, but I'm not actually sure the separate process is a net win, especially if SymbolServer could be optimized generally (e.g. see #269).

davidanthoff commented 1 year ago

Why is it important that SymbolServer not be available in the indexing process?

We want to run the indexing in exactly the same environment that the user has set up. And that environment almost certainly won't have SymbolServer in it, so we can't use standard package loading mechanisms to access it.

Maybe we could just move the indexing logic into separate .jl files, which can then be included by both the main SymbolServer module and the application scripts?

Yes, I think that could work, but really the design of the package SymbolServer.jl is that it is not loaded inside the process that is doing the actual indexing. The design is that SymbolServer.jl is the package that runs in a separate process that a) triggers indexing (in a child process) and b) gives access to the cached symbol information.

On the general idea of launching things in separate processes

The general idea here is this: we never want to load or run any user-code inside the language server process. Otherwise, the LS would crash every time there is a bug in a package that a user loads. That is not an option for us, the LS provides too many capabilities that work even if a user tries to load a package with an error to completely give up in these situations. The primary reason for the two-process model is to isolate and harden the language server process against errors in packages that users use.

mkitti commented 1 year ago

I wouldn't normally open a whole PR just to be pedantic about spelling :P. I'm happy to split those commits off if you prefer. They would produce a merge conflict with the current changes though.

PRs with just spelling or grammar changes are much easier to merge than ones mixed in with technical changes. There should not be a merge conflict with the changes are exactly the same.

thomasjm commented 1 year ago

Thanks for explaining @davidanthoff! It seems this PR isn't mergeable for now so I'll close it. I opened #271 with the spelling stuff.

The general idea here is this: we never want to load or run any user-code inside the language server process. Otherwise, the LS would crash every time there is a bug in a package that a user loads.

Possibly naive question, but aren't there tricks to work around this? I'm no Julia expert but I came across this method for loading a package using metaprogramming. (It presumably could be combined with a try/catch block, and even eval-ed in an isolated LoadingBay-style module of your choice.)

davidanthoff commented 1 year ago

Possibly naive question, but aren't there tricks to work around this?

I don't think so. Most basic situation: what if user code runs exit()? But in general Julia really has zero mechanisms to run a specific piece of code in some kind of sandbox within a given Julia process, so a separate process really is the only option. We certainly could catch some errors, but I don't see any way to make it robust against all errors (which the current solution is).

thomasjm commented 1 year ago

Hmm, a library calling exit() at import-time seems pretty out there. If the vast majority of errors could be caught by a try/catch block, maybe there's a still tradeoff to be made here.

However--has anyone ever thought about doing this analysis statically, without loading code at all? I was looking at JuliaSyntax.jl and it seems like a pretty impressive project, and advertises that one of its goals (besides speed) is to support tooling.

Static analysis is how most LSP implementations work after all. As a point of comparison: Python is similar to Julia in many ways, as an interpreted language which has the ability to run side-effectful code at import time. All the Python LSP servers I'm aware of rely on static analysis using a library such as jedi. The now-deprecated Microsoft Python language server wasn't even written in Python. Could there be a future where Julia's language server works the same way?

davidanthoff commented 1 year ago

Hmm, a library calling exit() at import-time seems pretty out there. If the vast majority of errors could be caught by a try/catch block, maybe there's a still tradeoff to be made here.

Yeah, that is probably not the best example. But for example, any binary shared library that any package loads that crashes would take down the entire language server. And when that crashes, a lot of things in the extension stop working, so that is just not an option.

However--has anyone ever thought about doing this analysis statically, without loading code at all?

Yes, at the moment we are doing a hybrid: we look at the current project, and for anything that is deved we use static analysis, and for all other packages we extract all information via the SymbolServer.jl (either from a cloud cache or by loading it locally once and then caching locally). The problem with static analysis is that Julia is a very dynamic language and we essentially stand no chance of really ever fully getting it right. Things like @eval just evade any attempt at static analysis. In my mind, the current hybrid story is the right compromise: for things that don't change quickly (i.e. at each key press) we extract all the information that there is by loading the package at some stage, and for things that change very quickly (i.e. files are edited in real time) we use static analysis, that won't get everything right, but is fast.

julia-vscode / SymbolServer.jl

Make SymbolServer.jl more library-like #270