Open matklad opened 3 years ago
There's a transparent intermediate representation ..
Should it store in LSIF ?
I think it’s better not to, it seems simpler to have a thing taylored specifically for rust and code browsing
We probably should also have LSIF support, because my guess is the GitHub's code analysis engine which they use for providing support for go to definition in other languages uses.
I do think we should have LSIF, but I think that's orthogonal feature .
Yeah, I think we're in agreement there. In theory, the LSIF dump would probably use a lot of similar infrastructure.
GitHub's code navigation uses https://github.com/github/semantic
Seems a bit of a weird choice for them to try to implement language analysis for a bunch of languages themselves :confused:
That would at least explain why it can be so inconsistent though, I guess.
This drifting oftopic, but I actually see reasons for them to do their own thing. They have a boatload of code, and they can't build it (both in terms of resources, and in terms of each project requiring a specific environment to build). In this scenario, building a distributed version on IntelliJ architecture makes sense. It would not be precise, but it would be robust.
If you have a boatload of code, but you can build it (because, for example, you are Google, so there's unified Bazel build system), then language-specific Kythe approach makes sense.
Spans in files refer to defs by ID. Defs are untyped, and store their defining spans and a list of refs.
Why have def IDs at all, and not just refer to everything by span?
a list of files
I imagine this might include virtual "macro expansion" files so we can navigate into those :thinking:
Yeah, maaybe spans would be enough. I just thought that, as we are going to store lists of usages somewhere, storing them in defs would be cleaner. It might actually save some memory -- for span, you need to save file id and range, while id is u32. ids give you span interning in a sense, which is nice, because every def is used several times typically.
I guess specific details of intermediate repr are not importatn, as log as:
Has anyone wrote some code here? I feel like I might want to do this on the weekend, but I don't want to spoil the fun if someone is already working on it ) (not: I might totally drop the ball here myself)
this seems very similar to https://github.com/rust-lang/rust/pull/84176 and also related to https://github.com/rust-analyzer/rust-analyzer/issues/6570
@benmkw thanks for that link, that's exciting! I think rustdoc gaining support for goto def changes motivation here a bit. Mainly, it will relieve the pressure to ship "anything" asap, so instead we can concentrate more on exploration and experimentation, unencumbered by maintenance and backward compatibility concerns. I think the end goal we want is two tools (which might be surfaced via the same top-level command), one of which renders the docs (so it rearranges the items to make docs more readable), while the other renders source code. On the doc front, rustdoc is perferct. On the source code rendereng, I think we need a lot more exploration before we are ready to ship something stable, and I think exploration on top of rust-analyzer would be the easiest.
while the other renders source code
to me this sounds kind of like vscode/ an editor
concentrate more on exploration
as I tried to describe in #6570 I think it would be nice to have easier access to rust source code for e.g. creating graphs of the flow of execution (callgraph) or maybe custom lints (clippy plugins are deprecated https://blog.rust-lang.org/inside-rust/2019/11/04/Clippy-removes-plugin-interface.html , I'm not sure what would be the current best practice to have per project lints, thats something Jai wants to push where people enforce certain things in their project by looking at the AST) ... maybe such an API could generalise to this extend. I'm thinking of very project specific rules which are not possible to be upstreamed. Something like clang tidy/ libtooling https://www.youtube.com/watch?v=rbbCgVQrjWs
Ok, I didn't went farther than outlining the general design in my head, here are some points:
struct RearedView
and to_html
should actually live in a separate repository (rm
crate), which does not depend on rust-analyzer. This should allow folks to iterate on the front-end parts without suffering compile times. rm
crate shouldn't be rust centric -- I imagine someone would like to re-use the front-end bits to build similar tools for other languages.rm
crate shouldn't use LSIF from the start (to improve iteration speed), but using LSIF as a source of data should be kept in mind. rm
crateThe niche we are targeting is "casual source code browsing for medium-sized projects". If you want to really deeply understand the source code, you need to open it in an IDE. If you have a monorepo with loads of code, you probably want to have more integrated tooling (source browser + vcs + ide + surfacing coverage results). If you are on your couch with a tablet, got bored with Tolstoy and want to read some burntSushi instead, that's where reader mode will be useful.
Regarging the more general point of providing the base API which makes building such tooling easy outside of rust-analyzer, that's a laudable goal, but I don't see it coming in the next few years. We are still figuring this out. Like, a week ago I realised that we probably can do significantly better then our current approach: https://github.com/rust-analyzer/rust-analyzer/issues/8713.
Our meta-strategy here is thus:
shouldn't use LSIF from the start (to improve iteration speed), but using LSIF as a source of data should be kept in mind
I briefly looked at this https://github.com/sourcegraph/sourcegraph and it seems pretty close to what you have in mind but I see you already interacted with them so probably no news here
Yeah, SG is similar, but different. They target "monorepo with loads of code", hence:
For this tool, I imagine a singe static binary which you can invoke from CI pipilene to publish a set of static documents.
I made Sourcetree which is basically the equivalent of rm
trait mentioned here. It can show source code of itself. It is buggy and not perfect (just two days old), but it works and has hover, go to definition and finding references.
Theoretically, it should work out of the box with any LSIF generator, but it is only tested against lsif-tsc
. So if RA can generate a LSIF dump, problem of this issue would be solved.
@matklad Are you still against starting with LSIF? LSIF actually helped me in prototyping and I now have a working demo without writing a single line of LSP related things. And Sourcegraph can take benefit from a rust LSIF generator in addition to Sourcetree/RA-reader-mode. And LSIF is pretty close to LSP so it should be easy to emit LSIF in RA. And it is easily extendable so LSIF won't limit us, I'm ok to support non-standard LSIF features from RA, though we should prioritize supported features over non-standard ones.
I'd love the end-experience to be rust-analyzer reader-mode /path/to/project/Cargo.toml
, that is, I think there's a value in this working out of the box in rust-analyzer. If this is a single binary, I don't think we should have IPC format (LSIF) involved here.
Ultimately, we want to binaries: rust-analyzer that can produce html, and reader-mode-standalone
, which can consume LSIF and produce html:
We can glue rust-analyzer
and reader-mode-standalone
via LSIF, but that adds an extra, unnecessary intermediate language.
We should have LSIF output from rust-analyzer some day, but that's for making it interoperable, not for rendering source code in a nice way.
That said, I won't be against going through ra -> lsif -> rm
route, I just think that architectualy it's suboptimal.
To clarify -- I imagine rm
to be written in Rust and usable as a library -- we should stay within a single language ecosystem.
I see reader-mode
project out of scope of rust analyzer. It has some signs of it:
RA doesn't have it's own code editor, but expose LSP so existing code editors like vscode and vim can use it. Like code editors and LSP, there is a class of code consumers which I call code viewers, and rm crate is a member of this class, (other examples are Sourcegraph, Github, rustdoc, ...) and as RA doesn't need it's own code editor (thought it is possible and it may have some benefits), it also doesn't need it's own code viewer.
If we decouple reader mode project from rust analyzer, staying within a single language ecosystem would not be necessary. Vscode is in TS, vim is in C, Sourcegraph is in go, and there is no problem. That being said, it maybe make sense to RIIR. If I want to compare rust and TS for reader-mode/sourcetree:
Rc
, and this kind of overhead.If you agree to decouple reader-mode from RA, this kind of discussion can be in the sourcetree repository. We can open #3098 and close this one.
Anyway, If you want a rust analyzer specific code viewer or IDE, no one can prevent you. RM and RA should be two separate projects is just my opinion.
To clarify, I don’t think any coupling between ra and rm is needed in either setup, I’d say such coupling would be harmful. In “rm as a library”, it would be rust-analyzer who depends on rm, and not the other way around.
To me personally, single static binary, and minimal JS/CSS feel defining for the domain I had in mind: simple out-of-the box experience which is just good enough, but also just works.
Naturally, we can go the more general way of “emit LSIF and let the user plug that into whatever”. That’ll work for more use-cases, but will sacrifice the “just works” bit.
That out of the box experience can be achieved using a third tool (closer to RM than RA) that isn't tied to rust. That tool can be a bash script / static binary that do:
In addition to multiple languages, it can support multiple tools on the other end as well. For example --static
for using RM and --source-graph <endpoint>
for uploading to sourcegraph.
Aside from such a tool, there are other ways to improve user experience, for example sourcegraph provides some github actions that makes using it even easier. RM can provide a similar action to sourcegraph-upload-action and use existing ones from sourcegraph for language. Relying on this is github-centeric but there can be other tools at the same time. To summarize, tooling can be improved to infinity on demand.
And if I understand correctly, desktop users won't use RM/sourcetree normally because they can fire up an IDE. So bundling RM with RA would just harm majority of users (small harm, but anyway).
Maybe this could work in the browser? With github.dev? Not sure
I don't know about github.dev but github1s.com is based on sourcegraph so it can use lsif.
For github.dev using rust-analyzer compiled to wasm makes more sense as github.dev allows you to edit the code, not just look at it.
How much of rust-analyzer's analysis could be done purely in the browser, and how much requires using cargo to download dependencies?
I think rustdoc gaining support for goto def changes motivation here a bit.
FYI this is not a stable feature and I would actually much rather see it in RA than in rustdoc. I've discussed this with @guillaumegomez and we're not planning to make further changes to the rustdoc side before there's an RFC for the feature.
To be noted that even if the RFC is accepted, it's quite limited. RA could provide locations where an item is used for example, which we don't plan to add.
An option for changing rustdoc src links (for example to github repository) can be an alternative RFC. Among other benefits (like viewing git history) they have some code intelligence features (probably more powerful than rustdoc). Gitlab and sourcegraph consume LSIF, so they can use RA, and github has its own. And if there becomes a RA specific thing, it can be integrated with rustdoc with that configuration as well.
An option for changing rustdoc src links (for example to github repository) can be an alternative RFC.
How would you make sure that the same commit is used? How will it handle crates that have been uploaded with changes? What if the original repo or the specific commit gets deleted?
In terms of security? It allows the malicious crate owner to fake src links in docs. Crate owner already can do bad things and there is other ways to see the actual source of crate, but I understand if it is not ok.
In terms of usability, deleted commits and moved repositories can lead to dead links, so crate owners should be able to change them for past versions if they care, so such an option should be a crate-wide setting and/or cli parameters and not for example attributes in source code.
In terms of security? It allows the malicious crate owner to fake src links in docs. Crate owner already can do bad things and there is other ways to see the actual source of crate, but I understand if it is not ok.
When I click on the src button I expect to be presented a rustdoc generated source page even if it doesn't contain the correct source code. I don't expect to end up on an attacker controlled website that may exploit a browser bug. If the original documentation page contains malicious code, it would be auditable on docs.rs and be removable by the people operating docs.rs. An attacker controlled website may only present the malicious code for some people and can't be taken down by docs.rs. Also while docs.rs currently allows documentation to contain arbitrary javascript, this may be restricted in the future. Because of this I think this makes this existing security issue worse.
In terms of usability, deleted commits and moved repositories can lead to dead links, so crate owners should be able to change them for past versions if they care, so such an option should be a crate-wide setting and/or cli parameters and not for example attributes in source code.
On docs.rs it wouldn't be possible to retroactively change this option, so broken links can't be fixed again.
If the original documentation page contains malicious code, it would be auditable on docs.rs and be removable by the people operating docs.rs. An attacker controlled website may only present the malicious code for some people and can't be taken down by docs.rs.
If documentation contains links to a malicious site, we can also take down the documentation, it's not limited to the source HTML. Admittedly there's a risk that links will linkrot and start pointing to a malicious site, but that's not specific to source files.
TL;DR: https://github.com/rust-dev-tools/cargo-src, but on top of rust-analyzer
Add a command,
rust-analyzer reader-mode path/to/Cargo.toml
, which produces a directory with a bunch of html files, which contain source for the project, a-la rustdoc output, but also have goto definition, hover, etc.Interface Constraints:
python3 -m http.server
and such on usres' machine)Implementation Constraints:
ide-reader-mode
and is exposed viaide
crate.struct ReaderView
. This view is independent of the hir or ide types, and is just a large chunk of data. The data includes two things, a list of files and a list of defs. Each file is a text plus a number of auxilary maps that describe ranges in the file. There should be a map for syntax hightlighting, a map for references and definitons, maybe a map for folding ranges (fold method bodies would be useful feature to have). Spans in files refer to defs by ID. Defs are untyped, and store their defining spans and a list of refs. Not sure if we need a separate ref list or just spans would be enough.fn to_html(&self) -> HashMap<PathBuf, String>
method, which renders it as a directory with htmls and csses. This should be completely independent of the rust-analyzer.