rustdoc: Stop cloning the resolver

jyn514 commented 3 years ago

Rustdoc currently creates a copy of the resolver to use for intra-doc links: https://github.com/rust-lang/rust/blob/d474075a8f28ae9a410e95d849d009006db4b176/src/librustdoc/core.rs#L350-L354 This is a Terrible, Horrible, No Good, Very Bad Idea. In particular, it causes rustdoc's copy of the resolver and the TyCtxt to disagree about what crates exist:

It's also distorting rustc_resolve, since not all of the outputs make sense to clone in the first place: https://github.com/rust-lang/rust/pull/65625#discussion_r336780545

We should refactor rustdoc somehow to allow getting rid of Resolver::clone_outputs. @petrochenkov suggests moving anything that needs to touch the resolver before HIR lowering: https://github.com/rust-lang/rust/issues/68427#issuecomment-578493425.

@petrochenkov what do you think about @eddyb's comment in https://github.com/rust-lang/rust/pull/65625#discussion_r336781320 ?

Why would cloning be needed? Is this that support for resolving things after rustc_resolve finishes? In that case I'm not sure why we need to clone the outputs - is it to be able to keep the original resolver around?

Would it be possible for lower_to_hir to stop stealing the resolver? https://github.com/rust-lang/rust/blob/d474075a8f28ae9a410e95d849d009006db4b176/compiler/rustc_interface/src/queries.rs#L227

Then rustdoc wouldn't need to clone it in the first place, it could just call queries.expansion().peek().1 whenever it needs access to the resolver.

See https://github.com/rust-lang/rust/issues/68427 for previous discussion.

Implementation History

[x] Initial "early crate loader": https://github.com/rust-lang/rust/pull/83738
[x] Don't require clean::Items before resolving intra-doc links: https://github.com/rust-lang/rust/pull/83833
[x] Separate type-relative resolution from path resolution (at least somewhat): https://github.com/rust-lang/rust/pull/83849
[x] Move early loader into collect_intra_doc_links: https://github.com/rust-lang/rust/pull/84101
[ ] Move all path resolution from late to early module and add new field on DocContext for partially resolved links
- [ ] Change preprocess_link to also take into account the hacks for Self:: and crate: https://github.com/rust-lang/rust/blob/ef524710669a003cf2b24361dc6b04b566e560fa/src/librustdoc/passes/collect_intra_doc_links.rs#L880, https://github.com/rust-lang/rust/blob/ef524710669a003cf2b24361dc6b04b566e560fa/src/librustdoc/passes/collect_intra_doc_links.rs#L1124-L1153

petrochenkov commented 3 years ago

I'm not sure what do you mean by freezing the resolver outputs, but I meant the regular rustc's mode of operation

resolver is unique and is not kept under Rc
therefor to_resolver_outputs always results in calling Resolver::into_outputs rather than Resolver::clone_outputs
Resolver::into_outputs consumes the resolver with ResolverOutputs being the only thing left from it, i.e. it's a "frozen" summary of the resolver's work at a certain point

So we need to achieve "freezing" the resolver outputs in rustdoc rather than to avoid it.

Regarding eddyb's comment. Yes, rustdoc resolves things after creating tcx (which needs ResolverOutputs to be created), so we have to keep the resolver. This needs to be avoided by moving all name resolution activity in rustdoc to an earlier point.

Would it be possible for lower_to_hir to stop stealing the resolver? Then rustdoc wouldn't need to clone it in the first place, it could just call queries.expansion().peek().1 whenever it needs access to the resolver.

Not sure about the details of rustc_interface's work, but yes, resolver should be unique and should not be cloned.

jyn514 commented 3 years ago

So we need to achieve "freezing" the resolver outputs in rustdoc rather than to avoid it.

Ok, I see, I updated the title.

Yes, rustdoc resolves things after creating tcx (which needs ResolverOutputs to be created), so we have to keep the resolver. This needs to be avoided by moving all name resolution activity in rustdoc to an earlier point.

I'm not quite sure I follow - is there a reason the tcx has to freeze the resolver into ResolverOutputs? Or could the tcx store the resolver itself? If it stored the resolver rustdoc could stop cloning it without major changes.

To be clear, I don't yet know how much work it would be for rustdoc to move all resolution before creating a TyCtxt, but the model where TyCtxt stores a resolver makes more sense in my head; or at least, it's more flexible.

Possible scheme for rustdoc to move resolution before TyCtxt

1. Remove [`clean::Attributes::links`](https://doc.rust-lang.org/nightly/nightly-rustc/rustdoc/clean/types/struct.Attributes.html#structfield.links) 2. Define a new type: ```rust enum EarlyIntraDocLink { Resolved(clean::ItemLink), AssocItem { base_ty: Res, item_name: Symbol, ns: Namespace, /* possibly more fields are necessary */ }, Variant { parent_def: Res, variant: Symbol }, // this might be able to be resolved early since the resolver implements DefIdTree, see https://github.com/rust-lang/rust/blob/d1065e6cefa41fe6c55c9819552cdd61529096fc/src/librustdoc/passes/collect_intra_doc_links.rs#L2121 for the current logic // This can't be emitted early because it's a HIR lint, it needs a TyCtxt available. Error { kind: ResolutionFailure, diag: DiagnosticInfo, path_str: String, disambiguator: Disambiguator }, } ``` 3. Add a new field to `DocContext`: `intra_doc_links: FxHashMap>` 4. Expand `IntraDocCrateLoader` (from #83738) to fill out that field by handling more things that are currently in `collect_intra_doc_links`. I don't think it will be able to handle anything more than `resolve_path`: https://github.com/rust-lang/rust/blob/d1065e6cefa41fe6c55c9819552cdd61529096fc/src/librustdoc/passes/collect_intra_doc_links.rs#L469 5. Change `collect_intra_doc_links` to remove all uses of the resolver. For context, these are the current uses of the resolver in collect_intra_doc_links: - https://github.com/rust-lang/rust/blob/d1065e6cefa41fe6c55c9819552cdd61529096fc/src/librustdoc/passes/collect_intra_doc_links.rs#L762-L772 (shouldn't need access to tcx) - https://github.com/rust-lang/rust/blob/d1065e6cefa41fe6c55c9819552cdd61529096fc/src/librustdoc/passes/collect_intra_doc_links.rs#L425-L431 (doesn't need access to tcx) - https://github.com/rust-lang/rust/blob/d1065e6cefa41fe6c55c9819552cdd61529096fc/src/librustdoc/passes/collect_intra_doc_links.rs#L319-L323 (only needs tcx after using resolver) - https://github.com/rust-lang/rust/blob/d1065e6cefa41fe6c55c9819552cdd61529096fc/src/librustdoc/passes/collect_intra_doc_links.rs#L469-L472 (doesn't need tcx)

petrochenkov commented 3 years ago

Or could the tcx store the resolver itself?

It may be possible. Resolver contains a lot of early compilation stuff like NodeIds that is out of place in tcx though. Also it may be preferable to finish all the resolution activities as early as possible (at least those that may result in loading new crates). I think we may end up with some scheme working similarly to this if incremental compilation and queries advance so much that we'll start skipping resolution of paths if they are not actually used, but that would probably be a very different resolver.

For now, I'd prefer to normalize to what rustc does and have a single scheme with resolver working early instead. Paths in doc comments are not fundamentally different from paths in e.g. expressions, and if we visit and resolve paths early in general, then paths in doc comments can be visited and resolved early as well. (If we migrate to something like "resolver in tcx" in the future, then rustc and rustdoc will also migrate to it together.)

jyn514 commented 3 years ago

Remove clean::Attributes::links

Done in https://github.com/rust-lang/rust/pull/83833 (although I need to clean it up before it can be merged).

jyn514 commented 3 years ago

Expand IntraDocCrateLoader (from #83738) to fill out that field by handling more things that are currently in collect_intra_doc_links. I don't think it will be able to handle anything more than resolve_path:

This isn't currently possible because resolving Self goes through the TyCtxt to get the name instead of the resolver: https://github.com/rust-lang/rust/blob/2616ab1c57e2d69f989307389b27ee996ed82575/src/librustdoc/passes/collect_intra_doc_links.rs#L868-L885

Here is a fix (it uses a tcx currently, but the same API is available on CStore): https://github.com/rust-lang/rust/commit/5d5be028b318ed7e542bf238ebf2fc009a830a5a

Original ramblings before I realized this was easy

There are two possible ways forward: 1. Fix the FIXME at the top and use the DefId directly instead of stringifying. That requires finding a way to call `.def_kind` with only a Resolver - looks like I can do that with [`CStore::def_kind`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_metadata/creader/struct.CStore.html#method.def_kind), and I can get a CStore from [`Resolver::cstore`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_resolve/struct.Resolver.html#method.cstore). 2. Find a way to get the stringified name of `Self` with the resolver directly (just an equivalent of `opt_item_name` should be enough). (This is https://github.com/rust-lang/rust/commit/5d5be028b318ed7e542bf238ebf2fc009a830a5a)

jyn514 commented 3 years ago

Oh wait no I'm dumb, https://github.com/rust-lang/rust/commit/5d5be028b318ed7e542bf238ebf2fc009a830a5a only helps if the item isn't an impl block. @petrochenkov Is there a way to resolve the Self type of an impl block? I tried resolver.resolve_str_path_error(DUMMY_SP, "Self", TypeNS, self_id), but that panics because it's not a module.

If not it's not a giant deal, I'll just have to figure out how to fix the FIXME on line 868.

jyn514 commented 3 years ago

Ugh, I have to figure out how to replicate this bit somehow: https://github.com/rust-lang/rust/blob/2616ab1c57e2d69f989307389b27ee996ed82575/src/librustdoc/passes/collect_intra_doc_links.rs#L1824-L1833 Maybe the early pass could do that? I think the late pass for type-relative paths will need to have a partial resolution anyway, so it shouldn't be too much more work.

petrochenkov commented 3 years ago

Is there a way to resolve the Self type of an impl block?

rustc keeps the "current impl" (or other item providing Self) in the late resolution visitor walking AST (see fn with_self_rib_ns). The analogous visitor in rustdoc is IntraLinkCrateLoader right now.

I'm not sure how rustdoc resolves Self now, but it cannot do it through Resolver because resolver doesn't provide any methods able to resolve it. So it's probably fine for rustdoc to just continue doing what it's doing now.

jyn514 commented 3 years ago

rustc keeps the "current impl" (or other item providing Self) in the late resolution visitor walking AST (see fn with_self_rib_ns).

Thanks, this is helpful!

I'm not sure how rustdoc resolves Self now, but it cannot do it through Resolver because resolver doesn't provide any methods able to resolve it. So it's probably fine for rustdoc to just continue doing what it's doing now.

Hmm, ok. It can't do precisely the same thing because it uses the tyctxt to look up the type - maybe it could look up the DefKind while visiting the parent item? I have a few different ideas that could work (but I will probably end up just going with "use the DefId directly instead of stringifying" since I think I'll need it anyway).

jyn514 commented 2 years ago

Would it be possible for lower_to_hir to stop stealing the resolver?

Note that in practice this requires resolution to be incremental, since otherwise the tyctxt will still be using the frozen ResolverOutputs which has the same issues. See also https://github.com/rust-lang/compiler-team/issues/437.

petrochenkov commented 2 years ago

https://github.com/rust-lang/rust/pull/88679 now collects traits in scope for doc links, which rustdoc is later using in its attempts to emulate type-relative path resolution.

The only remaining thing now is to use the early doc link resolution (src/librustdoc/passes/collect_intra_doc_links/early.rs) to build a map (HashMap<(/*path*/ String, Namespace, /*parent module*/DefId), Res> or similar) that will be used by all later rustdoc passes to retrieve resolutions instead of calling into resolver using resolve_str_path_error.

petrochenkov commented 2 years ago

The branch in progress fixing this issue is https://github.com/petrochenkov/rust/tree/doclink2.

The functional part is mostly done, one major missing part is emulating the Self replacement in paths that is done by rustdoc, some other remaining special cases like this may also remain.

I'll start benchmarking after the functional changes are ready.

petrochenkov commented 2 years ago

Self is a difficult case due to the way in which rustdoc deals with it currently. Another branch in progress (https://github.com/petrochenkov/rust/tree/doclinkself) attempts to avoid the textual replacement of Self:: that is currently done by rustdoc with resolving Self to a type at type-checking time, without involving Resolver at all. So it's a pre-requisite for the doclink2 branch.

rust-lang / rust

rustdoc: Stop cloning the resolver #83761

Implementation History