rust-lang / rust-analyzer

A Rust compiler front-end for IDEs
https://rust-analyzer.github.io/
Apache License 2.0
14.15k stars 1.58k forks source link

Add an Optional `CargoTargetSpec`-like Data to rust-project.json #15892

Open davidbarsky opened 10 months ago

davidbarsky commented 10 months ago

Motivations

Hi! I'm writing this issue not entirely convinced that it's the right approach, but I figured I'd at least open an issue for discussion/feedback. Anyways: We've added two fields to the rust-project.json format in the rust-analyzer/buck integration, which have been really useful. They are:

We've added these fields because determining the ManifestPath and the owning targets on the fly by querying buck adds some non-trivial, perceptible latencies (200-300 milliseconds at the average, with some substantially higher latencies at the tail) in what users would otherwise expect to be a near-instant response. Similar operations in rust-analyzer are in the low single-digit milliseconds, to its immense credit!

I decided to open this issue to propose upstreaming these changes when I was working on adding generic runnable support[^1] and realized that a decent chunk of the information needed to successfully create a runnable (specifically, the manifest file) could be very easily added to a CargoTargetSpec via a rust-project.json, enabling a set of long-tail features and functionality that I've wanted to tackle in a Buck context but was blocked by either too-high latencies or rust-analyzer's lack of visibility into this information. These features include:

Approach

I'd like to add an optional TargetSpec (renamed from CargoTargetSpec) to project_model/src/project_json/Crate, where any build system that opts into providing a TargetSpec would be required to provide all information that TargetSpec expects. My goal for providing this constraint is to minimize maintenance costs for rust-analyzer.

Alternatives/Caveats.

I worry about proposing this feature because I worry:

[^1]: Note that the current state of runnables can largely work if this line is commented out and the rust-buck-companion extension queries the owner of the file in question, but that's not an approach that I'd like to take because latencies of querying a build system are too high.

matklad commented 10 months ago

@davidbarsky could you expand a little bit on what these extra fields actually do? This is out of my mind's L1 cache, so it's not immediately obvious to me how they could be used.

We've added these fields because determining the ManifestPath and the owning targets on the fly by querying buck adds some non-trivial, perceptible latencies

I am not entirely sure what this talks about (again, this is because at this point I remeber nothing :) ), but if this is about "what crate does foo.rs belong too" functionality (which is the first step of any analysis), that should be handled by source roots. See ProjectWorkspace::to_roots.

that a decent chunk of the information needed to successfully create a runnable

For this one in particular, I think we could do something super general, like in projec.json you could specify runnable_info: any for any target, and rust-analyzer then will just transfer this info as is over the LSP protocol for the frontent to deal with. If we want to re-use existing handling in the VS Code extension, we could be specific, and add somethign like

runnables: {
   "check": ["buck", "check", "my-crate-name"],
   "run": ["buck", "run", "my-crate-name"],
   "test": ["buck", "test", "my-crate-name", "--"],
}

where keys are a fixed set of commands with standrad semantics, and values are command-lines. note that I still believe that integrating that into flycheck would be fundamentally wrong, but if it's easy to do transparently, then we could do it (like, flycheck itself is fundamentally wrong, but we are still having it, cause its useful).

Exposing a dependency explorer

:+1: I think adding extra "display" information would be useful. Though, the danger here is mixing up things which are display only (and can be approximate fuzzy things, because human would sort them out) with things which have specific semantics (and where any fuzziness would confuse computers). We actually already have display_name in project.json for this purpose, and I think it would be fine to add display package or some such. Or, indeed, a path to manifest file, as long as it is DisplayPath(String), and not an actual path, as, I think, rust-analyzer shouldn't be interpreting that in any way.

davidbarsky commented 10 months ago

@matklad Thanks so much for taking the time!

could you expand a little bit on what these extra fields actually do? This is out of my mind's L1 cache, so it's not immediately obvious to me how they could be used.

We've added these fields because determining the ManifestPath and the owning targets on the fly by querying buck adds some non-trivial, perceptible latencies

I am not entirely sure what this talks about (again, this is because at this point I remeber nothing :) ), but if this is about "what crate does foo.rs belong too" functionality (which is the first step of any analysis), that should be handled by source roots. See ProjectWorkspace::to_roots.

Sure! To back up, neither of those two fields—BuckExtensions::build_file or BuckExtensions::label—are passed to rust-analyzer as part of rust-analyzer.linkedProjects (or, more specifically, the field as it exists in InitializeParams), but they have pretty useful in our little companion server/daemon (entrypoint here) that does target discovery/reloading of Rust crates in Buck. Put differently, it allows people to open an arbitrary Rust file in a Buck repo and get a functioning rust-analyzer pretty quickly without needing to manually generate a rust-project.json ahead of time.

I'll also quickly explain how we implemented the near-Cargo-like experience with Buck, since I think it provides some useful background context/demonstrate the value that this has for us, but feel to skip over this:

Notably, rust-analyzer did not not need to be aware of any new fields in order for us to implement this functionality.

For this one in particular, I think we could do something super general, like in projec.json you could specify runnable_info: any for any target, and rust-analyzer then will just transfer this info as is over the LSP protocol for the frontent to deal with. If we want to re-use existing handling in the VS Code extension, we could be specific, and add somethign like

runnables: {
   "check": ["buck", "check", "my-crate-name"],
   "run": ["buck", "run", "my-crate-name"],
   "test": ["buck", "test", "my-crate-name", "--"],
}

where keys are a fixed set of commands with standrad semantics, and values are command-lines.

I hacked something together by replacing the existing Cargo-based runnable functionality in rust-analyzer with Buck-based runnables. I didn't push it anywhere, but it works weirdly well. I didn't consider adding the runnables to the Crate in rust-project.json—if I'm parsing you correctly—but that's a pretty clever solution, honestly.

note that I still believe that integrating that into flycheck would be fundamentally wrong, but if it's easy to do transparently, then we could do it (like, flycheck itself is fundamentally wrong, but we are still having it, cause its useful).

You know, I've absolutely come around on flycheck being a hack for a bunch of reasons, but being able to construct a flycheck command using the crate name and/or the target label is pretty useful. At least for Cargo-based users, this would allow for per-crate flychecks, which I've personally wanted in tracing.

👍 I think adding extra "display" information would be useful. Though, the danger here is mixing up things which are display only (and can be approximate fuzzy things, because human would sort them out) with things which have specific semantics (and where any fuzziness would confuse computers). We actually already have display_name in project.json for this purpose, and I think it would be fine to add display package or some such. Or, indeed, a path to manifest file, as long as it is DisplayPath(String), and not an actual path, as, I think, rust-analyzer shouldn't be interpreting that in any way.

Yeah, I've avoided putting anything too load bearing in the "display" information. In this case though, I think that I'd it be valuable to do both (or at the very least, derive display information from the semantically load-bearing information) because there are a few spots where manifest path is used around rust-analyzer in a meaningful way.

[^1]: Yes, this is a LSP server, and I know it's weird, but it works. We might change this to be just plain JSON. [^2]: rust-analyzer was way faster at reindexing a project after modifying a rust-project.json than I thought it would be. This exceeded already high expectations for rust-analyzer.

P1n3appl3 commented 8 months ago

On Fuchsia we took a similar approach to extending rust-project.json though we didn't go nearly as far with the integration as what you've described with the companion extension. Here's some extra context in case it's helpful:

We primarily use GN/Ninja for our build system, and GN generates a rust-project.json with an additional label field in crate entries. It also maintains a mapping of source files to build targets (currently separate from rust-project.json, but could easily be combined) so that given a set of source files it can run a check-build on all the "affected" targets.

We currently don't have any use for linking rust-project.json crates to their BUILD.gn "manifest" files, but it seems reasonable to provide and we could easily update GN to do so. Overall #16135 looks interesting and we'd get some value out of TargetSpec being stabilized.

A concern I have with TargetSpec over a simple $saved_file placeholder is that today we already have the logic to do that source -> target(s) mapping outside of rust-analyzer, but if flycheck_command were a field on each entry with no info about which specific file was saved then we'd run into issues where rust-analyzer doesn't know which rust-project.json entries to actually run a flycheck on. I could see this being an issue with any non-cargo build system in a large project where it's easy to write some_flycheck_wrapper.sh that accepts $changed_file and does the right thing, but quite hard to get GN/ninja/meson/cmake/etc to spit out a rust-project.json with TargetSpecs that make it all work, especially in cases where the project doesn't own their own build system.

We're also mildly concerned about adding too much to rust-project.json, ours are around 20MB today and we'd want to make sure we measure any changes to make sure they don't drastically impact startup time.

davidbarsky commented 8 months ago

Thanks for responding!

We currently don't have any use for linking rust-project.json crates to their BUILD.gn "manifest" files, but it seems reasonable to provide and we could easily update GN to do so. Overall https://github.com/rust-lang/rust-analyzer/pull/16135 looks interesting and we'd get some value out of TargetSpec being stabilized.

The primary utility of such a field (at least, for me) is to reload rust-analyzer whenever the corresponding manifest file changes, but I can understand that not being particularly useful for you.

A concern I have with TargetSpec over a simple $saved_file placeholder is that today we already have the logic to do that source -> target(s) mapping outside of rust-analyzer, but if flycheck_command were a field on each entry with no info about which specific file was saved then we'd run into issues where rust-analyzer doesn't know which rust-project.json entries to actually run a flycheck on.

Gotcha: at least under Buck, it's decently easy/reliable to figure out what crates owns what file, but I don't see any reason why $saved_file (or something similar) can't also be supported.

We're also mildly concerned about adding too much to rust-project.json, ours are around 20MB today and we'd want to make sure we measure any changes to make sure they don't drastically impact startup time.

For the purposes of the linked PR, I'd like to move the commands responsible for runnables and flycheck out of each individual crate and onto the same level as sysroot and friends, which should limit the impact this has on the size of your rust-project.json and make it easier to lint and report against typos in those command. One followup question: do you generate a single rust-project.json for all Rust code in Fuschia's monorepo? For context, with Buck, we generate a new rust-project.json on the fly with https://github.com/rust-lang/rust-analyzer/pull/15863 each time a new, unindexed Rust file is opened.

P1n3appl3 commented 8 months ago

Yeah we currently use a single rust-project.json containing thousands of crates (pretty much "every reachable rust target in the build"). With lazy indexing enabled it's not too bad in terms of startup time, flychecks are the main thing that drags.

Y'all's solution with separate workspaces/on-the-fly rust-project.json generation is quite interesting and I can see how that'd be a superior experience in the buck2/bazel world, but it's probably out of reach for us without major additions to GN which are unlikely to happen.

Anyways, as long as it still includes the ability for the flycheck command to get passed the modified file I think we're happy with the approach given that #12882 is our main concern. FWIW I feel pretty much the same way about flycheck as your comment: sure it could be handled outside of the r-a, but dang is it convenient that r-a knows how to read rustc/clippy json diagnostics and give you code actions for suggested replacements and such. I think that convenience is enough to warrant a bit of extra complexity to handle it in a build-system-agnostic way.