rust-lang / compiler-team

A home for compiler team planning documents, meeting minutes, and other such things.
https://rust-lang.github.io/compiler-team/
Apache License 2.0
385 stars 67 forks source link

Add a new `--orchestrator-id` flag to rustc #635

Open LukeMathWalker opened 1 year ago

LukeMathWalker commented 1 year ago

Proposal (Updated on 2024-09-06)

Context

The JSON documentation for a crate often refers to items (e.g. functions, traits, types, etc.) defined in one or more its dependencies, either direct or transitive.
This can happen in a variety of scenarios:

Problem

The JSON documentation for a crate doesn't provide many details when it comes to foreign items. If you want to dig deeper, you must generate the JSON documentation for the crate where they are defined and then combine the information from those two JSON documents to get a complete picture[^intentional].
In other words, you need to walk the dependency graph to perform analyses where foreign items are in scope.

To walk the graph, you need to go from "I have this foreign item" to "here's the JSON documentation for crate X, where that item was defined".
This is the difficult part: the JSON documentation for a crate contains very little information about third party crates. In particular, it only reports their names and the root URLs to their docs. In the simplest scenario, that's not a problem: the crate name is enough to generate the JSON documentation if there is only one instance of that crate in the dependency graph.
It becomes an issue when multiple versions of the same crate appear together in the same JSON document. There will be multiple entries in external_crates with the same name, one for each version of that duplicated dependency.
There is no bullet-proof mechanism to disambuigate between those entries and map them back to specific package entries in the dependency tree of the "root" crate[^zulip]. That's the problem this MCP tries to solve.

The proposed solution

For rustdoc

rustdoc will provide an additional field in ExternalCrate, named orchestrator_id.
orchestrator_id will be of type Option<String>. orchestrator_id will be (optionally) provided by the tool orchestrating the overall build. This will be cargo in the most common case, but it may as well be an alternative orchestrator (bazel, buck2, etc.). If an orchestrator decides to provide an id for each build unit, it must guarantee that those ids uniquely identify that unit within the dependency graph.

For cargo

cargo will populate the orchestrator_id field.
The id will be a valid string for the --package flag in cargo (e.g. cargo rustdoc -p <build-id> should always work). This will allow tool builders to walk the graph without ambiguity when working with JSON documentation.

Implementation strategy

How would rustdoc receive this orchestrator_id, in practice?
rustdoc looks at the rmeta file to gather information about third party crates. We want to include orchestrator_id in that file to get it from cargo to rustdoc. This requires rustc to cooperate.

Adding a new (unstable) --orchestrator-id option to rustc seems to be the simplest way forward. This could then be stabilised as is or, if preferred, moved to a different "channel" (e.g. as part of -Cmetadata, its own -C flag, etc.).

Mentors or Reviewers

A mentor would definitely be appreciated, I never touched any of the code involved here.

Process

The main points of the [Major Change Process][MCP] are as follows:

Comments

This issue is not meant to be used for technical discussion. There is a Zulip stream for that. Use this issue to leave procedural comments, such as volunteering to review, indicating that you second the proposal (or third, etc), or raising a concern that you would like to be addressed.

[^intentional]: This is an intentional design choice, that's been discussed at length in the rustdoc team and that's been endorsed by all JSON-based tool builders that interface with the team. [^zulip]: See this Zulip thread for more details.

Notes

This proposal supercedes https://github.com/rust-lang/compiler-team/issues/622, incorporating the feedback and ideas that were surfaced in the associated Zulip stream.

Previous proposal, before first round of feedback ## The problem `rustdoc`'s JSON output for a crate must often refer to items (e.g. functions, traits, types, etc.) defined in one or more its dependencies. Very little information is captured about those dependencies: [their name and the root URL to their docs](https://docs.rs/rustdoc-types/0.21.0/rustdoc_types/struct.ExternalCrate.html). This causes issues when multiple versions of the same crate appear as direct dependencies (e.g. via renames) of the crate we are documenting—e.g. [it becomes impossible to find out where items are coming from](https://rust-lang.zulipchat.com/#narrow/stream/266220-rustdoc/topic/Identifying.20external.20crates.20in.20Rustdoc.20JSON/near/352551996). ## The proposed solution Add a `--build-id` flag to `rustc`. The value would be provided by `cargo` and then captured as part of the .rlib metadata, which would in turn make it available to `rustdoc`, as well as other parts of the compiler that might benefit from it (e.g. diagnostics, such as [version mismatches](https://github.com/rust-lang/rust/issues/110055) which currently limit themselves to `perhaps two different version of CRATE_NAME are being used?`). The provided `build-id` must: - Uniquely identify the crate it points to within the current workspace; - Be a valid string for the `--package` flag in `cargo` (e.g. `cargo rustdoc -p ` should always work). This allows toolmakers to fetch more information on-demand when working with `rustdoc`'s JSON output; - Be as human-friendly as possible (e.g. use just the crate name if there is no ambiguity) in order to be used in diagnostics.
rustbot commented 1 year ago

This issue is not meant to be used for technical discussion. There is a Zulip stream for that. Use this issue to leave procedural comments, such as volunteering to review, indicating that you second the proposal (or third, etc), or raising a concern that you would like to be addressed.

cc @rust-lang/compiler @rust-lang/compiler-contributors

estebank commented 11 months ago

Given that this is a user visible flag (otherwise I'd just second):

@rfcbot fcp merge

rfcbot commented 11 months ago

Team member @estebank has proposed to merge this. The next step is review by the rest of the tagged team members:

Concerns:

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

petrochenkov commented 11 months ago

@rfcbot concern other-existing-options

The zulip thread lists some alternatives like -Cmetadata, the proposal needs to describe why the existing options are not enough and why a new separate option is necessary for this.

One alternative, for example, is making -Cmetadata more extensible so it could include e.g. json with different pieces of auxiliary information like this.

pnkfelix commented 11 months ago

(I'm checking my box under the assumption that the implementation of this MCP will explore leveraging an extension of --metadata rather than adding a whole new --build-id flag.)

@rfcbot reviewed

wesleywiser commented 11 months ago

@rfcbot concern option-name

When reading the title, I assumed this was an flagthat would allow the user to choose what --build-id value is passed to the linker. I would prefer we choose a different name for this flag if we can't reuse or extend -Cmetadata as @petrochenkov suggests.

compiler-errors commented 8 months ago

@apiraino:

I don't think it makes sense to block this FCP on finding a mentor, unless you think the change to the compiler is so large that its acceptance should be conditional on finding someone to implement it (which seems out of the ordinary) -- IMO, the FCP is just for approval of the technical change. Also, the syntax is @rfcbot to file a concern on the FCP.

apiraino commented 8 months ago

@compiler-errors fair point, sorry for the noise. Just wanted to be sure that that point was't forgotten.

LukeMathWalker commented 4 weeks ago

I've updated the proposal to reflect the feedback above (and in Zulip). In particular:

petrochenkov commented 3 weeks ago

I still don't understand what is the point of this, rustc is already able to identify crates uniquely. Do you plan to somehow make these unique identifiers human-readable using higher-level tools?

I'll remove my concern to unblock the experimentation, but the scenario that I'd prefer to see in the future is

@rfcbot resolve other-existing-options

wesleywiser commented 3 weeks ago

@rfcbot resolve option-name

rfcbot commented 3 weeks ago

:bell: This is now entering its final comment period, as per the review above. :bell:

psst @estebank, I wasn't able to add the final-comment-period label, please do so.

rfcbot commented 1 week ago

The final comment period, with a disposition to merge, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

This will be merged soon.

psst @estebank, I wasn't able to add the finished-final-comment-period label, please do so.