rust-analyzer reader-mode

matklad commented 3 years ago

TL;DR: https://github.com/rust-dev-tools/cargo-src, but on top of rust-analyzer

Add a command, rust-analyzer reader-mode path/to/Cargo.toml, which produces a directory with a bunch of html files, which contain source for the project, a-la rustdoc output, but also have goto definition, hover, etc.

Interface Constraints:

Static site
Assume modern desktop browser
No external CSS, or CSS pre-processors, minimal hand-written CSS to get the job done.
No or minimal JavaScript
No build-in HTTP serever (but it's OK to try to auto-discover python3 -m http.server and such on usres' machine)
(Maybe) an option to compress everything down to a single html file which you can just open in the browser.

Implementation Constraints:

Implementatin lives in ide-reader-mode and is exposed via ide crate.
There's a transparent intermediate representation. The result of reader mode is not directly an HTML string, but something else. That something else is a set of annotated files, struct ReaderView. This view is independent of the hir or ide types, and is just a large chunk of data. The data includes two things, a list of files and a list of defs. Each file is a text plus a number of auxilary maps that describe ranges in the file. There should be a map for syntax hightlighting, a map for references and definitons, maybe a map for folding ranges (fold method bodies would be useful feature to have). Spans in files refer to defs by ID. Defs are untyped, and store their defining spans and a list of refs. Not sure if we need a separate ref list or just spans would be enough.
ReaderView might want to live in a separate crate, just to make sure it doesn't depend on rust-analyzer types
ReaderView has fn to_html(&self) -> HashMap<PathBuf, String> method, which renders it as a directory with htmls and csses. This should be completely independent of the rust-analyzer.

edwin0cheng commented 3 years ago

There's a transparent intermediate representation ..

Should it store in LSIF ?

matklad commented 3 years ago

I think it’s better not to, it seems simpler to have a thing taylored specifically for rust and code browsing

DJMcNab commented 3 years ago

We probably should also have LSIF support, because my guess is the GitHub's code analysis engine which they use for providing support for go to definition in other languages uses.

matklad commented 3 years ago

I do think we should have LSIF, but I think that's orthogonal feature .

DJMcNab commented 3 years ago

Yeah, I think we're in agreement there. In theory, the LSIF dump would probably use a lot of similar infrastructure.

jonas-schievink commented 3 years ago

GitHub's code navigation uses https://github.com/github/semantic

flodiebold commented 3 years ago

Seems a bit of a weird choice for them to try to implement language analysis for a bunch of languages themselves :confused:

DJMcNab commented 3 years ago

That would at least explain why it can be so inconsistent though, I guess.

matklad commented 3 years ago

This drifting oftopic, but I actually see reasons for them to do their own thing. They have a boatload of code, and they can't build it (both in terms of resources, and in terms of each project requiring a specific environment to build). In this scenario, building a distributed version on IntelliJ architecture makes sense. It would not be precise, but it would be robust.

If you have a boatload of code, but you can build it (because, for example, you are Google, so there's unified Bazel build system), then language-specific Kythe approach makes sense.

flodiebold commented 3 years ago

Spans in files refer to defs by ID. Defs are untyped, and store their defining spans and a list of refs.

Why have def IDs at all, and not just refer to everything by span?

flodiebold commented 3 years ago

a list of files

I imagine this might include virtual "macro expansion" files so we can navigate into those :thinking:

matklad commented 3 years ago

Yeah, maaybe spans would be enough. I just thought that, as we are going to store lists of usages somewhere, storing them in defs would be cleaner. It might actually save some memory -- for span, you need to save file id and range, while id is u32. ids give you span interning in a sense, which is nice, because every def is used several times typically.

I guess specific details of intermediate repr are not importatn, as log as:

It's abstraction layer are text documents (not hir types)
it knows (almost) nothing about rust-analyzer (it's tempting to re-use minor things like hightlight tags)
it knows nothing about html

matklad commented 3 years ago

Has anyone wrote some code here? I feel like I might want to do this on the weekend, but I don't want to spoil the fun if someone is already working on it ) (not: I might totally drop the ball here myself)

benmkw commented 3 years ago

this seems very similar to https://github.com/rust-lang/rust/pull/84176 and also related to https://github.com/rust-analyzer/rust-analyzer/issues/6570

matklad commented 3 years ago

@benmkw thanks for that link, that's exciting! I think rustdoc gaining support for goto def changes motivation here a bit. Mainly, it will relieve the pressure to ship "anything" asap, so instead we can concentrate more on exploration and experimentation, unencumbered by maintenance and backward compatibility concerns. I think the end goal we want is two tools (which might be surfaced via the same top-level command), one of which renders the docs (so it rearranges the items to make docs more readable), while the other renders source code. On the doc front, rustdoc is perferct. On the source code rendereng, I think we need a lot more exploration before we are ready to ship something stable, and I think exploration on top of rust-analyzer would be the easiest.

benmkw commented 3 years ago

while the other renders source code

to me this sounds kind of like vscode/ an editor

concentrate more on exploration

as I tried to describe in #6570 I think it would be nice to have easier access to rust source code for e.g. creating graphs of the flow of execution (callgraph) or maybe custom lints (clippy plugins are deprecated https://blog.rust-lang.org/inside-rust/2019/11/04/Clippy-removes-plugin-interface.html , I'm not sure what would be the current best practice to have per project lints, thats something Jai wants to push where people enforce certain things in their project by looking at the AST) ... maybe such an API could generalise to this extend. I'm thinking of very project specific rules which are not possible to be upstreamed. Something like clang tidy/ libtooling https://www.youtube.com/watch?v=rbbCgVQrjWs

matklad commented 3 years ago

Ok, I didn't went farther than outlining the general design in my head, here are some points:

The struct RearedView and to_html should actually live in a separate repository (rm crate), which does not depend on rust-analyzer. This should allow folks to iterate on the front-end parts without suffering compile times.
rm crate shouldn't be rust centric -- I imagine someone would like to re-use the front-end bits to build similar tools for other languages.
rm crate shouldn't use LSIF from the start (to improve iteration speed), but using LSIF as a source of data should be kept in mind.
rust-analyzer should depend on rm crate

The niche we are targeting is "casual source code browsing for medium-sized projects". If you want to really deeply understand the source code, you need to open it in an IDE. If you have a monorepo with loads of code, you probably want to have more integrated tooling (source browser + vcs + ide + surfacing coverage results). If you are on your couch with a tablet, got bored with Tolstoy and want to read some burntSushi instead, that's where reader mode will be useful.

matklad commented 3 years ago

Regarging the more general point of providing the base API which makes building such tooling easy outside of rust-analyzer, that's a laudable goal, but I don't see it coming in the next few years. We are still figuring this out. Like, a week ago I realised that we probably can do significantly better then our current approach: https://github.com/rust-analyzer/rust-analyzer/issues/8713.

Our meta-strategy here is thus:

do not attempt to build any kind of public API
do ship "end-user experience / tools"
use the experience from shipping features to shape the in-progress API better.

benmkw commented 3 years ago

shouldn't use LSIF from the start (to improve iteration speed), but using LSIF as a source of data should be kept in mind

I briefly looked at this https://github.com/sourcegraph/sourcegraph and it seems pretty close to what you have in mind but I see you already interacted with them so probably no news here

matklad commented 3 years ago

Yeah, SG is similar, but different. They target "monorepo with loads of code", hence:

Git
Go (1.15 or later)
Docker
PostgreSQL (v11 or higher)
Node.js (version 8 or 10)
Redis
Yarn

For this tool, I imagine a singe static binary which you can invoke from CI pipilene to publish a set of static documents.

HKalbasi commented 3 years ago

I made Sourcetree which is basically the equivalent of rm trait mentioned here. It can show source code of itself. It is buggy and not perfect (just two days old), but it works and has hover, go to definition and finding references.

Theoretically, it should work out of the box with any LSIF generator, but it is only tested against lsif-tsc. So if RA can generate a LSIF dump, problem of this issue would be solved.

@matklad Are you still against starting with LSIF? LSIF actually helped me in prototyping and I now have a working demo without writing a single line of LSP related things. And Sourcegraph can take benefit from a rust LSIF generator in addition to Sourcetree/RA-reader-mode. And LSIF is pretty close to LSP so it should be easy to emit LSIF in RA. And it is easily extendable so LSIF won't limit us, I'm ok to support non-standard LSIF features from RA, though we should prioritize supported features over non-standard ones.

matklad commented 3 years ago

I'd love the end-experience to be rust-analyzer reader-mode /path/to/project/Cargo.toml, that is, I think there's a value in this working out of the box in rust-analyzer. If this is a single binary, I don't think we should have IPC format (LSIF) involved here.

Ultimately, we want to binaries: rust-analyzer that can produce html, and reader-mode-standalone, which can consume LSIF and produce html:

We can glue rust-analyzer and reader-mode-standalone via LSIF, but that adds an extra, unnecessary intermediate language.

We should have LSIF output from rust-analyzer some day, but that's for making it interoperable, not for rendering source code in a nice way.

That said, I won't be against going through ra -> lsif -> rm route, I just think that architectualy it's suboptimal.

matklad commented 3 years ago

To clarify -- I imagine rm to be written in Rust and usable as a library -- we should stay within a single language ecosystem.

HKalbasi commented 3 years ago

I see reader-mode project out of scope of rust analyzer. It has some signs of it:

rm crate shouldn't be rust centric
rm crate does not depend on rust-analyzer (for compile times, but anyway)

RA doesn't have it's own code editor, but expose LSP so existing code editors like vscode and vim can use it. Like code editors and LSP, there is a class of code consumers which I call code viewers, and rm crate is a member of this class, (other examples are Sourcegraph, Github, rustdoc, ...) and as RA doesn't need it's own code editor (thought it is possible and it may have some benefits), it also doesn't need it's own code viewer.

If we decouple reader mode project from rust analyzer, staying within a single language ecosystem would not be necessary. Vscode is in TS, vim is in C, Sourcegraph is in go, and there is no problem. That being said, it maybe make sense to RIIR. If I want to compare rust and TS for reader-mode/sourcetree:

benefits of rust
- It can be a library. TS version can also be a library in npm, but rust version can be a library for any language, even npm.
- It can be a single static binary. It may harm portability, but it has benefit for common targets.
benefits of TS
- This code isn't performance sensitive because generating LSIF dumps would be bottleneck, so with TS there is no need for fighting with borrow checker, Rc, and this kind of overhead.
- Some amount of JS/TS is necessary for such a project, by choosing TS, it would become a single language and simpler codebase
- It can make instant feedback, watching and reloading in browser, but rust needs some time for compiling.

If you agree to decouple reader-mode from RA, this kind of discussion can be in the sourcetree repository. We can open #3098 and close this one.

Anyway, If you want a rust analyzer specific code viewer or IDE, no one can prevent you. RM and RA should be two separate projects is just my opinion.

matklad commented 3 years ago

To clarify, I don’t think any coupling between ra and rm is needed in either setup, I’d say such coupling would be harmful. In “rm as a library”, it would be rust-analyzer who depends on rm, and not the other way around.

To me personally, single static binary, and minimal JS/CSS feel defining for the domain I had in mind: simple out-of-the box experience which is just good enough, but also just works.

Naturally, we can go the more general way of “emit LSIF and let the user plug that into whatever”. That’ll work for more use-cases, but will sacrifice the “just works” bit.

HKalbasi commented 3 years ago

That out of the box experience can be achieved using a third tool (closer to RM than RA) that isn't tied to rust. That tool can be a bash script / static binary that do:

Detect language of the codebase
Find or install a lsif generator for that language
Find or install RM
Run RM against the lsif dump

In addition to multiple languages, it can support multiple tools on the other end as well. For example --static for using RM and --source-graph <endpoint> for uploading to sourcegraph.

Aside from such a tool, there are other ways to improve user experience, for example sourcegraph provides some github actions that makes using it even easier. RM can provide a similar action to sourcegraph-upload-action and use existing ones from sourcegraph for language. Relying on this is github-centeric but there can be other tools at the same time. To summarize, tooling can be improved to infinity on demand.

And if I understand correctly, desktop users won't use RM/sourcetree normally because they can fire up an IDE. So bundling RM with RA would just harm majority of users (small harm, but anyway).

Milo123459 commented 3 years ago

Maybe this could work in the browser? With github.dev? Not sure

HKalbasi commented 3 years ago

I don't know about github.dev but github1s.com is based on sourcegraph so it can use lsif.

bjorn3 commented 3 years ago

For github.dev using rust-analyzer compiled to wasm makes more sense as github.dev allows you to edit the code, not just look at it.

eminence commented 3 years ago

How much of rust-analyzer's analysis could be done purely in the browser, and how much requires using cargo to download dependencies?

jyn514 commented 2 years ago

I think rustdoc gaining support for goto def changes motivation here a bit.

FYI this is not a stable feature and I would actually much rather see it in RA than in rustdoc. I've discussed this with @guillaumegomez and we're not planning to make further changes to the rustdoc side before there's an RFC for the feature.

GuillaumeGomez commented 2 years ago

To be noted that even if the RFC is accepted, it's quite limited. RA could provide locations where an item is used for example, which we don't plan to add.

HKalbasi commented 2 years ago

An option for changing rustdoc src links (for example to github repository) can be an alternative RFC. Among other benefits (like viewing git history) they have some code intelligence features (probably more powerful than rustdoc). Gitlab and sourcegraph consume LSIF, so they can use RA, and github has its own. And if there becomes a RA specific thing, it can be integrated with rustdoc with that configuration as well.

bjorn3 commented 2 years ago

An option for changing rustdoc src links (for example to github repository) can be an alternative RFC.

How would you make sure that the same commit is used? How will it handle crates that have been uploaded with changes? What if the original repo or the specific commit gets deleted?

HKalbasi commented 2 years ago

In terms of security? It allows the malicious crate owner to fake src links in docs. Crate owner already can do bad things and there is other ways to see the actual source of crate, but I understand if it is not ok.

In terms of usability, deleted commits and moved repositories can lead to dead links, so crate owners should be able to change them for past versions if they care, so such an option should be a crate-wide setting and/or cli parameters and not for example attributes in source code.

bjorn3 commented 2 years ago

In terms of security? It allows the malicious crate owner to fake src links in docs. Crate owner already can do bad things and there is other ways to see the actual source of crate, but I understand if it is not ok.

When I click on the src button I expect to be presented a rustdoc generated source page even if it doesn't contain the correct source code. I don't expect to end up on an attacker controlled website that may exploit a browser bug. If the original documentation page contains malicious code, it would be auditable on docs.rs and be removable by the people operating docs.rs. An attacker controlled website may only present the malicious code for some people and can't be taken down by docs.rs. Also while docs.rs currently allows documentation to contain arbitrary javascript, this may be restricted in the future. Because of this I think this makes this existing security issue worse.

In terms of usability, deleted commits and moved repositories can lead to dead links, so crate owners should be able to change them for past versions if they care, so such an option should be a crate-wide setting and/or cli parameters and not for example attributes in source code.

On docs.rs it wouldn't be possible to retroactively change this option, so broken links can't be fixed again.

jyn514 commented 2 years ago

If the original documentation page contains malicious code, it would be auditable on docs.rs and be removable by the people operating docs.rs. An attacker controlled website may only present the malicious code for some people and can't be taken down by docs.rs.

If documentation contains links to a malicious site, we can also take down the documentation, it's not limited to the source HTML. Admittedly there's a risk that links will linkrot and start pointing to a malicious site, but that's not specific to source files.

rust-lang / rust-analyzer

rust-analyzer reader-mode #8696