rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
98.45k stars 12.73k forks source link

Documentation size can grow significantly due to documentation of impls on type aliases #115718

Closed steffahn closed 1 year ago

steffahn commented 1 year ago

In the latest nightly (I’ve tested 62ebe3a2b 2023-09-08), from #115201 we get better/more documentation on type aliases, including inherent and trait impls of the aliased type.

For some crates, this can have a severe effect on the size of the generated documentation (and the time it takes to generate it).

I’ve pointed this out on the PR after merging, too ( https://github.com/rust-lang/rust/pull/115201#issuecomment-1712576460 ), but it’s better tracked as a separate issue, since the change is overall really nice, and even in some of these cases, the larger documentation is a tradeoff against arguably better documentation.

The crates I’ve tested are nalgebra and typenum. The nalgebra crate features a single and very impl-rich Matrix type together with a very large number of type aliases for Matrix. These aliases at the moment contain notices like

Because this is an alias, not all its methods are listed here. See the Matrix type too.

which indicates, a nice inlined listing of (an appropriately filtered list of) relevant impls is a good addition. However, I’m also noticing the generated docs grow significantly in size. Using default features, I see between 1.72 and the abovementioned nightly, the size grows from 27MB to 515MB.

At the very least, it’s probably quite desired that the significant number of deprecated aliases don’t create extra HTML bloat.

Looking at typenum, it features a handful of types like in particular UInt, NInt, and PInt for type-level unsigned, and signed (positive and negative) integers, each of which come with a significant amount of trait impls, though by far not as large as nalgebra’s Matrix’s impls list. However, the number of type aliases involving these is absolutely huge, and the end result for my test was a size increase from 44MB to 523MB for the generated documentation. For the case of typenum, all this extra HTML content seems – at least to me – pretty much unnecessary.

How to address this?

It seems to me that only a handful of crates like this are really negatively affected. While for the case of deprecated items, we could decide from the side of rustdoc to just not include the impls automatically, for a case like typenum it seems to me like an opt-out might be appropriate.

Right now, a way typenum could avoid the issue would be to use #[doc(hidden)] to entirely hide the alias. (Then, it could still describe in prose which numbers are supported by aliases.) But that seems like it has unnecessary downsides in reader experience. One opt-in solution for hiding impls on an alias could thus be an alternative attribute, similar to existing #[doc(hidden)] and #[doc(inline)] attributes, e.g. it could be named #[doc(no_impls)] or #[doc(hidden_impls)] (and [at least initially] only be supported only for type aliases).

Further evaluation

I should probably also look into the effects on ndarray. And I might be unaware – off the top of my head – of other crates that could be significantly affected. Feel free to point out any crates that have similarly significant additional “bloated” documentation from this change, so we can evaluate whether the discussed or to-be-discussed approaches help in those cases as well.

steffahn commented 1 year ago

For completeness, let me include the answer ( https://github.com/rust-lang/rust/pull/115201#issuecomment-1712581277 ) from @GuillaumeGomez on the PR thread where I first pointed this out:

I think the concern is valid, and it might be possible to add a mechanism to allow to disable this behaviour, but it needs to be discussed first about:

  • Do we want to allow people to opt out of this feature?
  • How do we disable this feature?
    • A global attribute?
    • A command line option?
    • An attribute to be used on each type alias?
    • An attribute to be used on module?

Of course, any of the item above can be combined with others. Lot of possibilities. ;)

steffahn commented 1 year ago

@rustbot label regression-from-stable-to-nightly, -regression-untriaged, T-rustdoc, A-rustoc-ui

edit why did this silently do nothing?

GuillaumeGomez commented 1 year ago

I added it to rustdoc team agenda to be discussed in next meeting.

the8472 commented 1 year ago

Would the type-definitions in those cases be mergeable or do the type aliases add constraints that make the impl docs unique?

steffahn commented 1 year ago

@the8472 What would you mean by “mergeable”? Like including html from multiple files on the same page? Each impl block gets its own html file and is included into all places where it shall appear? Or do you mean something entirely different?

Is this even possible with HTML? (I’m quite the HTML noob myself.) Or would we like an approach to only include impls on type alias pages if Javascript is enabled?

the8472 commented 1 year ago

What would you mean by “mergeable”? Like including html from multiple files on the same page?

No, just generating one page for a set of ~similar type aliases. E.g. if a type-alias sets a generic to a specific value and that makes some impls non-applicable then those impls shouldn't be listed on that page. But that only works most of the type-aliases set the same value.

I haven't actually seen an example what these changes look like (the PR didn't have an example)... so maybe my suggestion doesn't make sense.

Is this even possible with HTML?

Kinda, kinda not. iframes could be used but they have lots of limitations. There also are some legacy features around XML/XSLT that did including but that's not relevant to HTML. Anyway, not what I was suggesting.

steffahn commented 1 year ago

No, just generating one page for a set of ~similar type aliases. E.g. if a type-alias sets a generic to a specific value and that makes some impls non-applicable then those impls shouldn't be listed on that page. But that only works most of the type-aliases set the same value.

Thanks for the clarification. I think that might be hard to do nicely. I do believe that for most use-cases the current (new) approach of listing relevant impls on the type alias page itself, is a very good and useful approach … my only concern really is only with the memory and time overhead for the more extreme cases of many (typically macro-generated) aliases of a complex (many impls) type with various parameters.

I haven't actually seen an example what these changes look like (the PR didn't have an example)... so maybe my suggestion doesn't make sense.

To look at the documentation in question here, all that’s necessary is:

cargo new test_new_aliases_doc
cd test_new_aliases_doc
cargo add nalgebra
cargo doc --open

and then navigate to …/test_new_aliases_doc/target/doc/nalgebra/base/index.html#types (i.e. module base in crate nalgebra) and choose any of the type synonyms to look at.

A few type aliases can also be found in std, e.g. std::io::Result. You could also look into recently published crates on docs.rs e.g. this one is one I could find that has a few type aliases where you can look at what the docs look like.

the8472 commented 1 year ago

Then maybe a general heuristic would then make sense to generate a N-similar-item page instead of 1-item alias page once the number of similar aliases in a module exceeds some threshold.

GuillaumeGomez commented 1 year ago

That seems "too clever" to sound like a good idea to be honest.

the8472 commented 1 year ago

Nono, it's quite obvious. Multiple Items, Single Documentation (MISD, SIMD's strange cousin) 😉

GuillaumeGomez commented 1 year ago

Yes, but I don't see how the documentation rendering, especially if there is added documentation on the type alias, will be done nicely and won't be like super confusing.

the8472 commented 1 year ago

Type aliases for Foo<A=CommonValue, B>

Bar = Foo<A=CommonValue, B=OtherValue> [+]

Description

Baz<T> = Foo<A=CommonValue, T> [+]

Description

Implementations

Note: All the above aliases have the same set of implementations

Impl

...

GuillaumeGomez commented 1 year ago

It would also mean displaying the "Aliased type" for each too. For small ones, it's fine, for big ones, not so much.

the8472 commented 1 year ago

Well you can move that down to the collapsible block and just list the name in the headline. I'm gesturing at the general area in the solution space here, not claiming to have solved every issue.

GuillaumeGomez commented 1 year ago

Sorry if my comments seemed a bit harsh. It's just that UI/UX problems are actually often very complicated and I tried for each of your suggestions to show a potential problem. :sweat_smile:

the8472 commented 1 year ago

Yeah I'll acknowledge that such a batch-page wouldn't be ideal and that an actual implementation attempt may run into more serious problems. It is a suggestion to solve the footprint issue and work around the limitation of HTML at potentially non-zero loss of readability but without completely removing the pages which might be a bigger loss.

GuillaumeGomez commented 1 year ago

The only way I can see for now would be to use JS, but I'm not a big fan of anything JS-related as it increases the code quantity quite a lot (and also doesn't work if you disable JS).

the8472 commented 1 year ago

If you mean including via JS then we could have a fallback by showing a link to the shared impls page when JS is off.

An even simpler heuristic is to stop rendering the impl blocks above some threshold of ~similar type aliases and link to the original type instead

apiraino commented 1 year ago

@rustbot label regression-from-stable-to-nightly, -regression-untriaged, T-rustdoc, A-rustoc-ui

edit why did this silently do nothing?

without commas (I think) and fixed one typo

@rustbot label +regression-from-stable-to-nightly -regression-untriaged +T-rustdoc +A-rustdoc-ui

camelid commented 1 year ago

We discussed this in the t-rustdoc meeting today, and we felt that the feature should be kept but we need to change the implementation to reduce the page sizes. Most people were on board with using JS to share the HTML across type aliases, though there's some concern about JS maintainability.