Open obi1kenobi opened 1 year ago
@rustbot label +T-rustdoc +A-rustdoc-json
Having done an in-depth exploration of alternatives, requiring rustdoc consumers to do name resolution is extremely not practical. I firmly believe that making glob reexports include a list of shadowed names for each namespace (specifically, "these names look like they should be imported but they aren't usable" which includes imported-but-ambiguous names) is the right design option here.
Properly resolving shadowing is a job for rustc. Otherwise one has to reimplement a substantial chunk of rustc to get it right:
Here's an example that requires all of the above: equivalent playground example
mod first {
mod a {
pub(crate) struct Foo(
// Change `pub(super) bool` to `pub(crate) bool`
// to cause a breaking change.
//
// Just swap which line is commented out and which is included:
pub(super) bool // this is fine
// pub(crate) bool // this is a breaking change
);
}
mod b {
pub(crate) struct Foo{}
}
pub(crate) use a::*;
pub(crate) use b::*;
}
mod second {
pub struct Foo();
}
use first::*; // *** NOT A `pub use` ***
pub use second::*;
This crate exports Foo
in the values namespace, as the implicit constructor function of second::Foo
— and that's it, all other names are unusable due to ambiguity.
If the field at first::a::Foo
becomes pub(crate)
, then:
first::a::Foo
's implicit constructor becomes pub(crate)
visible toouse first::*
imports itFoo
" ambiguous for this crate, effectively making the "function Foo
" name not usable in this module or as a re-export.Even if rustdoc included info about private imports (necessary to determine shadowing), the logic is obscenely difficult to get right. cargo-semver-checks
has made me very comfortable with the rustdoc format; even so, I've spent two weeks trying and failing to get a working implementation that nails all the edge cases here.
In my proposed solution, the rustdoc entry for the pub use second::*;
item would include a list field that effectively says "type name Foo
is not usable here" when the implicit constructor is imported and usable. When the implicit constructor is ambiguous and unusable, the field would list both "type name Foo
" and "value name Foo
" as unusable. I'm agnostic to the specific naming or structure of these records, so I didn't propose any so we don't accidentally start bikeshedding.
IIUC there are actually two separate problems:
Ctor
).So a fix in rustdoc-json actually requires not just a list of the private imports ("unusable names"), but also the namespace they're in — and for that to be useful, you also need to know the namespace of the public imports.
I have several thoughts:
kind
for private items), but it should make at least the public side easier.Thanks for helping think through the options here, I appreciate it!
re: 2., I'm worried it could be difficult to do — though that makes it all the more valuable if it can be done! The mapping isn't as straightforward as "struct => type ns". For example: unit structs' names go both into the type and the value ns, and tuple structs always go into the type ns but if their Ctor is visible they also go into the value ns. And whether the constructor is visible requires knowing about private fields, etc. If there's a reasonable way to help consumers not have to implement this logic, I'm all for it!
re: 3, a key concern is the size of the rustdoc JSON file. There are crates that when scanned by cargo-semver-checks
produce ~220MB of JSON by themselves, without dependencies. Glob exports and unusable names are a way to keep file size down, at the expense of more complex logic for tracking what's exported.
In my proposal, unusable names wouldn't contain names of private items unless they shadow (or make ambiguous) a pub glob's imports. That means each unusable name appears at most once per local definition (either import or new item), and at most once per module where a glob import that causes ambiguity. This is O(n)
total and I believe it is the minimum possible.
Perhaps I've misunderstood the details of listing unambiguous exports is, so please excuse me if I'm off base here. As I understand it, I think listing unambiguous exports is O(n^2)
. Consider a program like:
pub mod a {
pub mod b {
pub mod c {
pub struct Foo;
pub enum Bar {
First,
Second,
}
pub use Bar::*;
}
pub use c::*;
}
pub use b::*;
}
pub use a::*;
and see how many times Foo
or First
would appear here.
Also consider cyclic imports that create infinite importable paths, like here. To avoid infinite loops in such cases, the unambiguous path listing could use a rule like "not listing paths that visit the same module more than once" (as cargo-semver-checks
does), but then it'd be strange that Rust is able to successfully resolve paths that the unambiguous path listing doesn't include.
While my two examples may seem contrived, both cyclic imports and fully public multi-layered glob chains like that are used in popular Rust crates today.
re: 3, a key concern is the size of the rustdoc JSON file. There are crates that when scanned by cargo-semver-checks produce ~220MB of JSON by themselves, without dependencies. Glob exports and unusable names are a way to keep file size down, at the expense of more complex logic for tracking what's exported.
hmm, this is a very good point. I want to look into exactly why it's so large at some point (I suspect a lot of it will go away when we get rid of paths
), but it's true that getting rid of globs means the output is fundamentally larger. I think if we only refer to items by their ID, without inlining, it might not be much larger in practice? I don't have data to back that up, though.
For example: unit structs' names go both into the type and the value ns, and tuple structs always go into the type ns but if their Ctor is visible they also go into the value ns
Yes, agreed this is a tricky case. I'm not sure what our output looks like for that currently; do we only emit the struct once no matter how many namespaces it's in? If so we probably do need at least to emit directly in the JSON whether the Ctor is private for the helper function to work. That seems doable though, and in the spirit of the existing data we expose, I don't have any reservations about adding it.
I don't think there's any inlining left anymore. Foreign traits implemented by local types are no longer inlined, and crate items are only emitted once even if they are exported by multiple namespaces, leaving the name resolution portion to the user. This is obviously a bit tricky on the user side, but IMHO is the right choice given the constraints.
I don't think paths
contributes significantly to file size, to be honest. If optimizing for size, I'd start by leaning more heavily into omitting default representation like None
and []
and {}
and assuming that something like serde will fill in the blanks. The 220MB JSON output is just because that crate (aws-sdk-ec2
) is gigantic — 240k autogenerated items, ~17x more than syn
with the full
feature.
I think exporting unusable names would make emitting whether the Ctor is public unnecessary (though I'm not opposed to it!). Ctor visibility resolution is only necessary to resolve possible glob import shadowing or ambiguity, because there the flavors of restricted visibility matter as in my example two posts ago. For merely public / non-public levels, it's enough to look at the type's field visibilities which doesn't actually require private items: if fields were stripped, the Ctor is private too.
I think inlining is the wrong word - I was imagining a single JSON object for all imports in the same module, with kind: use
and a list of the IDs that were imported. That shouldn't be too much larger than the glob imports I think.
I agree we should either do my ctor visibility idea or your unusable imports idea but not both.
Oh I see, interesting idea. I'll need to think about it a bit more.
On Sat, May 13, 2023, 2:40 PM jyn @.***> wrote:
I think inlining is the wrong word - I was imagining a single JSON object for all imports in the same module, with kind: use and a list of the IDs that were imported. That shouldn't be too much larger than the glob imports I think.
I agree we should either do my ctor visibility idea or your unusable imports idea but not both.
— Reply to this email directly, view it on GitHub https://github.com/rust-lang/rust/issues/111338#issuecomment-1546727986, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAR5MSXC2AZ3WAI2FZ3EEN3XF7IRPANCNFSM6AAAAAAXZNSXEY . You are receiving this because you authored the thread.Message ID: @.***>
A few points worth chatting about that I came up with:
O(n^2)
in the case of nested pub globs, which may still be a concern.Ctor
isn't the only edge case — unit structs also put their singleton value in the values ns, and that can be shadowed / made ambiguous separately from the type so we still need to know if it's visible or not.
I tried this code:
I expected to see this happen: when either line
(1)
or(2)
is added, rustdoc JSON files should offer some way to detect that the nameFoo
is no longer part of the crate's public API.Instead, this happened:
(1)
or(2)
is included.--document-private-items
and--document-hidden-items
, rustdoc generates identical JSON whether or not(2)
is included.Just like using a private type in
(1)
, the same problem can be caused by using a#[doc(hidden)]
public type, which will also not be emitted in rustdoc JSON unless--document-hidden-items
is set.This means that shadowing glob re-exported items is in general currently impossible to reliably detect via rustdoc JSON, even with
--document-private-items
and--document-hidden-items
. As a result,cargo-semver-checks
cannot detect breaking changes caused by such shadowing, which have happened in real life: around a year ago, the opencl3 crate seems to have suffered a regression when a large number of its items were accidentally shadowed.Relatedly, in #111336 I'm arguing that such shadowing should trigger a lint since it's essentially never what one wants: anything it can productively accomplish can be better accomplished in another way.
I plan to publish a blog post with more details about this in a few hours, it will be available at the following link: https://predr.ag/blog/breaking-semver-in-rust-by-adding-private-type-or-import/
Meta
rustc --version --verbose
:That version of rustc emits rustdoc JSON format version v24.
Possible solution
One option I thought of would be to tweak the rustdoc format to include a list of shadowed names in glob re-exports. This would allow proper name resolution in all cases, and without requiring
--document-private-items
and--document-hidden-items
to be set.Another option would be to include non-
pub
imports when--document-private-items
is set. This only allows proper name resolution in rustdoc JSON generated with both--document-private-items
and--document-hidden-items
set.Perhaps there are other options as well.
Personally, I'd prefer the former option over the latter, since it seems reasonable to expect that rustdoc JSON shouldn't require flags that document explicitly non-public-API components in order to fully describe the public API.