[Feature Request]: Persistent caches

montekki commented 4 years ago

One of the latest posts states:

One of the current peculiarities of rust-analyzer is that it doesn’t persist caches to disk. Opening project in rust-analyzer means waiting a dozen seconds while we process standard library and dependencies.

And while that may be true when working with IDEs like VScode that are launched once and used for a long period of time, other workflows that involve editors like vim and closing/opening lots of windows that don't share rust-analyzers state between each other is actually quite painful. Each time a new window is opened one has to sit and wait and for large repos that is quite a long period.

P.S. Sorry if it's a duplicate

bjorn3 commented 4 years ago

As far as I understand @matklad doesn't want to do this yet as it would reduce the necessity of optimizing the initial analysis, thus reducing the likelihood that people will work on reducing the initial analysis time. There has been some discussion about the specific use case of closing and re-opening vim often, but so far nothing has changed.

tjkirch commented 4 years ago

It could be almost as good if rust-analyzer could be left running and shared between editing sessions.

bjorn3 commented 4 years ago

I think that would be something the client needs to do. There have also already been enough complaints about rust-analyzer keeping running after the client exits because of bugs.

tjkirch commented 4 years ago

I think that would be something the client needs to do.

Perhaps not necessarily. The current lifetime of the rust-analyzer process is tied to an editing session. Instead, I could imagine the analysis being split off and done in a longer-lived process that the session process communicates with. The longer-lived process would need to handle concurrent access and perhaps purge data after a time, but it would remove the startup cost for later editing sessions and reduce the memory usage for multiple editors.

bjorn3 commented 4 years ago

Who would manage that longer-lived process if the client doesn't? If nobody does, it will keep running forever, which is bad. If the language server started by the client does, it would exit as soon as all language servers would exit, which makes it useless for closing and re-opening vim.

flodiebold commented 4 years ago

One could imagine a scheme where rust-analyzer checks for a running server and forks and daemonizes if one isn't running yet, maybe shutting itself down automatically if there aren't any clients after a while. flow does something similar, for example. But I feel that's far too much complexity to fix a problem that basically only exists for vim users used to a certain workflow, to be honest.

matklad commented 4 years ago

We definitely won't implement persistent process withing rust-analyzer itself -- it is indeed a job for the editor. However, I think for editors like vim someone could write a separate rust-binary, rust-analyzer-supervisor, which would cache the connection to the ra.

rwols commented 3 years ago

Instead of a daemon process, wouldn't it be simpler to cache index results on disk? I know clangd does this.

bjorn3 commented 3 years ago

Persistent caches will require changes to salsa to be able to serialize it's cache. In addition it has the disadvantage that it makes optimizing the initial analysis less important, which may over time result in not just regressions of the initial analysis time, but also when performing a change.

Salsa's issue for serializable caches: https://github.com/salsa-rs/salsa/issues/10

Also an observation from https://github.com/rust-lang/rfcs/pull/1317#issuecomment-150965895:

My tips:

[...]

Don't store anything to disk. It's likely the oracle can be fast enough without doing this; and unnecessary complexity creates bugs. "Have you tried deleting the .ncb file?" (I remember having to do this a couple times per day when using VS, ca. 2005)

[...]

Note: I am not saying that persistent caches shouldn't every be implemented. I just think that it shouldn't be implemented yet.

lnicola commented 3 years ago

I remember having to do this a couple times per day when using VS, ca. 2005

To be fair, I think they got their stuff together when they moved to a database in... some previous decade.

matklad commented 3 years ago

Thinking about this, there seems to be the following options here:

implement on-disk persistence as a memory usage optimization: spill large data to disk
implement on-disk persistence as startup-time optimization: save salsa query graph to disk when exiting, and reuse it upon restart (ideally, validation would be equivalent to calling set and figuring out that nothing has changed. That is, we could use old crate graph even we haven't completely validated it)
implement on-disk persistence as a build-system-aware way to use precompiled libraries. That is, in the distributed build scenario, rust-analyzer would ask the build-system to provide precompiled artifacts for dependencies.

The last case is the hardest, and the most interesting one.

It is hard because it makes persistence a public API: on disk data is no-longer a private impl detail, but a shared state between rust-analyzer and the build-system. It is another input, like file text or procedural macros.

It is the most interesting, because it makes rust-analyzer scale: it becomes possible to distribute the computation of such pre-analyzer libraries across several machines and to put the results into a distributed cache, re-used by many instances of rust-analyzer.

It seems we want the following litmus test for implementing persistence: the on-disk cache can be computed by a different machine (which runs a different OS) and be used locally.

Implementation wise, it's pretty clear that the cache should be computed on per-crate granularity. Some less-obvious questions:

should rust-analyzer use the cache as is (mmap it basiccally), or should it parse it into salsa's internal data structures?
should the cache be a separate flavor of input, or a way to cache existing inputs? Would we have code paths like if has_cache { from_cache } else { from_source } or would that be a unified code path
can be just store everything in cache? We can store, eg, original source files, which makes the same code-path logic work.

flodiebold commented 3 years ago

I'd note that the file format for case 3 doesn't need to be the same as our internal cache format for cases 1 and 2 -- we could have e.g. rlibs as a possible input while caching the salsa database in a different format.

flodiebold commented 3 years ago

should rust-analyzer use the cache as is (mmap it basiccally), or should it parse it into salsa's internal data structures?

I think it'd be super interesting to use rkyv and mmap it, but maybe it's overengineering :sweat_smile:

lnicola commented 3 years ago

Some additional questions:

how nested is the data we're now storing in salsa? Table/relational data is easier to work with, but e.g. syntax trees will pose a problem.
how many salsa queries are we doing during a request, as an order of magnitude? Tens of thousands might require some smart caching.
would it be feasible (long-term) to hook into the salsa storage mechanism?
can we replace it completely, or do we store the same data both in salsa and on disk?

deontologician commented 3 years ago

It is hard because it makes persistence a public API: on disk data is no-longer a private impl detail, but a shared state between rust-analyzer and the build-system. It is another input, like file text or procedural macros.

There are lots of different compatibility contracts, such as "cache inputs are best effort. If rust-analyzer can't use it, it will recompute everything from scratch". That would also heavily imply not just doing mmap and trusting it to be correct, but validating the cache and failing back to the cold-cache path if it isn't compatible.

So concretely, in the build-system scenario, the inputs to each crate would be like:

rust-analyzer cache crate_b/src --existing-cache caches/crate_b_cache --is-lib=true > caches/crate_b_cache

ram19890 commented 3 years ago

Can we have an option to toggle(ON/OFF) sync, to turn off "fetching & caching" when an editor is opened while the cached data can be used from Memory or Disk, while it cached for the first time?? If the user wished the toggle to be enabled, let them have a persistence for the number of .rs files he/she has opened exceeded when he/she triggers it more than thrice or any number, and then it would sync automatically! (Setting a limit for the number of open files to trigger the sync! Default: "nolimit" )

There might be no use of daemon to run all the time!

Or Something simulate to "Android Project Treble" Like! (For Stability + Consistency)

lnicola commented 3 years ago

@ram19890 there's no persistent caching at all right now. If and when it is implemented, it's going to be possible to delete the persistent cache, but it's too early to tell if the cache is going to be optional or not.

There's also no daemon at all. That was a suggestion for Vim users who keep closing their editor and don't want to change their workflow. A daemon like that could probably be implemented outside of RA, but it's not the real solution.

matklad commented 3 years ago

Couple of thoughts here:

one unusual use-case here is that some people use .rlib as a way to distribute proprietary code. Such use-cases currently can't benefit from rust-analyzer (no source code available), but they could in theory use our own index format (if we actually erase method bodies)
there's a certain charm in just using rlibs -- that makes plugging rust-analyzer into existing build system easier. It's also true that rlibs are an end-game here -- it would be silly if compiler and IDE needed two separate "compiled library" formats. But using rlibs makes a deliberately unstable part of rust somewhat more stable, and there will be extra uncertainty as to who should be the emmitter of rlibs -- compiler or rustc. We'll also want to put extra things in rlibs (parmeter names), so :shrug:

bjorn3 commented 3 years ago

one unusual use-case here is that some people use .rlib as a way to distribute proprietary code.

rlibs leak filenames, doc comments for private items, the position of every item in the source file, the name of every function and type even if private and much more. I wouldn't be surprised if you could decompile them to something reasonably resembling the original source without a terribly huge amount of effort.

bjorn3 commented 3 years ago

parmeter names

-Zalways-encode-mir and the MIR local debuginfo got you covered.

lnicola commented 3 years ago

I think that people who want to distribute closed-source libraries would be better served by going through a C API. It's more work and it's boring, but you get interop with other every language under the Sun.

pr2502 commented 2 years ago

We definitely won't implement persistent process withing rust-analyzer itself -- it is indeed a job for the editor. However, I think for editors like vim someone could write a separate rust-binary, rust-analyzer-supervisor, which would cache the connection to the ra.

I've written something like this, it's a binary that replaces rust-analyzer in your editor and pipes the input/output through a local tcp socket to a server which persists one rust-analyzer instance per workspace and works around LSP limitations to keep the important functionality while supporting multiple clients (vim editor instances) on a single rust-analyzer instance, it also persists the rust-analyzer process for a while when all clients are closed until a timeout runs out.

Repo is here https://github.com/pr2502/ra-multiplex, it's still work in progress but it is usable for me with neovim and coc-rust-analyzer.

Pinging users who asked for a feature like this, sorry for spam if you don't need it anymore @montekki @tjkirch @flodiebold

jackos commented 2 years ago

@pr2502 Can't tell you how much I appreciate https://github.com/pr2502/ra-multiplex, it works great, and having a dedicated terminal with rust-analyzer debug messages is an added bonus.

Just to confirm for anyone using helix that stumbles on this thread you can put this in your ~/.config/helix/langauges.toml:

[[language]]
name = "rust"
...
language-server = { command = "ra-multiplex" }

This is a really straightforward and a fantastic feature, worthy of consideration in adding it as a subcommand to rust-analyzer imo.

melMass commented 1 year ago

I've written something like this, it's a binary that replaces rust-analyzer in your editor and pipes the input/output through a local tcp socket

Thanks a lot, it seems to also work well with the VSCode extension:

{
"rust-analyzer.server.path": "/Users/user/.cargo/bin/ra-multiplex"
}

akurniawan commented 10 months ago

Hi, are we still looking into this or we're using ra-multiplex to wrap RA now?

davidbarsky commented 9 months ago

I think persistent caches for rust-analyzer are still a nice-to-have that require a lot of design work before they're implemented. In the meantime, I recommend using ra-multiplex.

pandres95 commented 9 months ago

@pr2502, do you think ra-multiplexer would help me with keeping the indexing cache of large repositories that hold thousands of crates (i.e. projects that use polkadot-sdk) on VSCode?

I know it's a weird question, but I've been looking a solution for weeks as the number of deps in the project just keeps increasing, and it's hard enough not having a good solution to keep the cache running longer, especially when sometimes I'm opening multiple editor windows at the same time and multiple files in the same editor window.

pr2502 commented 9 months ago

it does work with vscode but it'll also make your life harder in other ways, the file watch events from clients (editors) are not propagated (yet), which means you have to manually reload the workspace (ra-multiplex reload) when adding/removing files or when Cargo.lock changes. you might end up with even more load if you change your project structure often.

rust-lang / rust-analyzer

[Feature Request]: Persistent caches #4712