Open jyn514 opened 4 years ago
The same could be done for doc.rust-lang.org.
You can open an issue on https://github.com/rust-lang/www.rust-lang.org for that site, it's managed by a different team. I imagine they would be very receptive since it's a completely static site.
Another alternative is a browser extension to redirect online version -> offline version, similar to what the IPFS Companion extension does. For example: https://doc.rust-lang.org/std/sync/struct.RwLock.html -> file:///home/teohhanhui/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/share/doc/rust/html/std/sync/struct.RwLock.html
Hmm, this is an interesting idea. I don't think it would work with relative links though, ..
would send you to doc/rust/html/std/sync/index.html
which might not yet exist. Also I'm not sure that this would work correctly if we used trailing slashes on any page instead of /index.html
.
That can be achieved with cargo doc to build local crates and rustup doc for the book, std, and everything else on doc.rustlang.org
The whole point of docs.rs is that you don't have to build the docs yourself, so while useful a useful tip I don't think it should replace being able to use docs.rs offline.
I don't know very much about PWAs. If we set pages to be cached for a longer time, would that meet this use case? That way you could visit the cached page even when you lost internet.
In regards to relative links, I wouldn't be sad if they went away, as they're not really a great thing in the first place. Replacing relative links would probably help simplify a good portion of code while also being less finicky/more difficult to mess up
In regards to relative links, I wouldn't be sad if they went away, as they're not really a great thing in the first place. Replacing relative links would probably help simplify a good portion of code while also being less finicky/more difficult to mess up
I strongly disagree. Without relative links we'd have to hardcode https://docs.rs
at the start of every url, which would break this anyway.
Also, rustdoc heavily uses relative links for documentation, I don't see a good way to change that since it doesn't know the absolute URL it will be used with.
Oh, I thought you meant relative links in reference to how we do our own source browser, with ..
being "up", but it seems we already use a canonical link for that
I don't know very much about PWAs. If we set pages to be cached for a longer time, would that meet this use case? That way you could visit the cached page even when you lost internet.
Changing the cache expiry would help, however that requires the user to manually toggle offline mode in their browser (which is a very hidden thing nowadays, if not impossible altogether...)
Changing the cache expiry would help, however that requires the user to manually toggle offline mode in their browser
That seems to defeat the point of caching :(
Glancing through the page you linked it seems like the main idea is to have some JavaScript that checks if the page is cached before making a network request. I agree that should be the behavior, but I'm not comfortable enough with JavaScript to implement it /don't have the time. If someone is interested in working on this I'd be happy to mentor though :) almost all of the site can be cached except the home page, /releases, and redirects.
Angular apps have service workers built into them implicitly, so if you guys are willing to upgrade this from a Python Jinja-like Tera front-end (https://crates.io/crates/tera) to an Angular front-end then you can get the Service Worker caching for free.
Here's some more info: https://angular.io/guide/service-worker-intro
As for the rust-lang website itself, it has a Handlebars front-end (https://github.com/rust-lang/www.rust-lang.org/blob/master/templates/index.hbs), which could also be replaced with an Angular front-end.
However, I think it'd probably be more on-brand for these Rust websites to have a Rust-based front-end that compiles to WebAssembly rather than be Javascript-based. The only such Crate I'm aware of that might do this is Yew, but it doesn't have Service Workers built into it as far as I know. It's not "production-ready", but since these websites are just static pages I don't think that that's a concern.
Angular could potentially be overkill since these sites are just static pages, but just because it has a bunch of bells and whistles doesn't mean you have to use them.
I'd strongly prefer for docs.rs to remain a static site first and foremost, and especially remain usable with JavaScript disabled. I'm fine with JS adding features on top, but the JS shouldn't be necessary just to use the site.
That said I don't know much about frontend, so maybe Angular can do that?
Service workers themselves are implemented on the front-end via Javascript, so I'm not sure that we can have our cake and eat it, too, in this situation.
With that design constraint, I'm not sure we can make this website offline-first. All we could do is just ask users to use their browser's "make available offline" feature if they want to use the site while offline.
Edit: Even WebAssembly requires Javascript to be enabled, so I'm not sure that any Rust-based WASM solution would work either.
Let me approach this from a different angle (I really like the framing in https://internals.rust-lang.org/t/pre-rfc-user-namespaces-on-crates-io/12851/96 to discuss things as problems to solve and not solutions to implement).
docs.rs currently is a dynamic site which serves static HTML. It does not have caching for rustdoc pages, which means the site is not available when you're offline. The goal of this issue is to be able to use docs.rs offline if you've already visited the relevant pages at least once.
If I'd never heard of PWAs, the way I'd imagine imagine implementing this is something like the following:
What this gets docs.rs is three things:
Regardless of the technologies or frameworks used, does that basic idea sound feasible?
Won't this be an issue for pages like doc.rust-lang.org/nightly/std/whatever.html
? I don't think we have an equivalent on docs.rs except when arriving on the crate page (but then it makes a redirection to the last version).
@GuillaumeGomez are you saying that this breaks once latest
no longer redirects to another page (https://github.com/rust-lang/docs.rs/pull/1527)? I think we can avoid that by just having a much shorter cache expiration date on those pages.
Yes it's what I meant.
I think this is probably feasible. Some questions to figure out: should all of docs.rs be one big PWA, which manages a cache of all the various docs you've visited? Or should each crate's doc be a separate PWA? Ideally we'd like the same behavior on doc.rust-lang.org, which means the functionality should be in rustdoc, which advocates towards a PWA per crate.
Also, it looks like Service Workers allow us to actually prefetch resources that the user hasn't visited yet. So for instance if you visit one page of a crate's docs, it could download all the pages of that crate's docs. The storage could add up fast, though, so we'd need heuristics about when or if to do that.
I have a local prototype of this that's kinda neat, and plan to work on it some more and will share results when they're good enough. I had high hopes of precaching a whole crate / the whole stdlib, but fetching that many files individually (30,847 for the stdlib) was prohibitively slow. And users probably wouldn't thank us for using that much data without a more explicit opt-in anyhow.
Here's my current thinking:
Note that in this scenario, nothing changes for users without JS; they never load the Service Worker.
Alternately, we could prefer freshness:
The first approach is quite similar to the Cache-Control stale-while-revalidate directive. As a simpler approach, we could try changing the headers on HTML pages. Right now they have no Cache-Control header. We could add max-age=0, stale-while-revalidate=5260000
. I think that would make the page available offline for up to 2 months, and if there is a newer version available it would get fetched in the background and be ready on the user's next page load. I need to do some testing on this - none of the docs for Cache-Control stale-while-revalidate explicitly mention offline.
Advantage for the Cache-Control approach: much easier to deploy and reason about.
Advantages of the Service Worker approach:
One of the exciting things about both approaches is they have the potentially to dramatically speed up repeat visits even when online.
By the way, to be able to readily experiment with this without the possibility of breaking docs.rs, it should be possible to run some totally third party site that has a Service Worker and fetches / serves pages from docs.rs as if those pages were on its own origin. But that would require settings Access-Control-Allow-Origin on all/most docs.rs pages. Is that reasonable to do?
I had high hopes of precaching a whole crate / the whole stdlib, but fetching that many files individually (30,847 for the stdlib) was prohibitively slow.
This should be possible once we finally implement downloadable docs :) that serves the docs as one big zipfile for the whole crate.
By the way, to be able to readily experiment with this without the possibility of breaking docs.rs, it should be possible to run some totally third party site that has a Service Worker and fetches / serves pages from docs.rs as if those pages were on its own origin. But that would require settings Access-Control-Allow-Origin on all/most docs.rs pages. Is that reasonable to do?
I would be worried about doing this on docs.rs in prod, but it shouldn't be terribly difficult to run a fork of docs.rs somewhere and add Access-Control-Allow-Origin
there.
Hmm, I guess that doesn't let you test how it interacts with cloudfront though.
Advantage for the Cache-Control approach: much easier to deploy and reason about.
This is very tempting :laughing: it sounds like you're volunteering to do much of the work, which I really appreciate :heart: but simpler to write also means simpler to review.
How hard would it be to switch between the two ideas at a later time? It sounds like a lot of the work is hooking the service worker up to the Cache API and actually changing the page, which is the same between both, right?
Switching at any point would be the same work as doing either change from scratch. If we use the Cache-Control: max-age=0, stale-while-revalidate=N
approach, it's a one-liner. We don't touch Service Worker or Cache API at all. If we do the Service Worker approach, it's a decent amount of work - and as you say, involves at least one other person learning enough about Service Worker to adequately review. :-)
The thing I worry about with stale-while-revalidate is this:
Of course, now that I write these out I see these are also a problem for the /latest/ change in general. For instance, you could have /latest/ (version 1.0) loaded in your browser when 2.0 is released, and click a link to one of the now-renamed structs.
The problem also exists for versioned URLs. For instance, visit https://docs.rs/rustls/0.19.0/rustls/trait.Session.html and click "Go to latest version" (Session was renamed to Connection in 0.20). I see somebody has already thought of the problem, and that link takes you to a search page across 0.20. That's pretty neat! Maybe that's adequate?
The other problem with stale-while-revalidate is: say you load the root page, see it's outdated, and reload. Then you click to another page you've visited before. That's also outdated. You have to reload that too. It would get frustrating pretty fast.
I see somebody has already thought of the problem, and that link takes you to a search page across 0.20. That's pretty neat! Maybe that's adequate?
Haha, yeah I spent a while on that :)
Of course, now that I write these out I see these are also a problem for the /latest/ change in general. For instance, you could have /latest/ (version 1.0) loaded in your browser when 2.0 is released, and click a link to one of the now-renamed structs.
Hmm, this should only be a problem if you have the page open for a long time, right? Because (with caching as current, but with #1527) the second you reload the reload the page you'll get the newer version. I think the combination of open for a long time + and intervening release + the struct was renamed is low enough that just having search is fine.
You load /regex/latest/regex on Nov 26. You know the crate was updated to 2.0 yesterday, renaming a bunch of structs. Because of stale-while-revalidate, your browser shows you version 1.0. That's confusing! Of course, if you reload, you'll get 2.0.
Yeah, that seems confusing. I'm not sure that "if you reload you'll get 2.0" is true though - don't you need to do a hard refresh to ignore the cache directive? I don't think we should do that for the /latest/ page. It seems ok for pages other than /latest/ though, they should only change if a bug in rustdoc itself was fixed and the crate was rebuilt.
That said, I'm fairly familiar with service workers from working at Cloudflare so if that sounds fun I say go for it :grin:
I'm not sure that "if you reload you'll get 2.0" is true though - don't you need to do a hard refresh to ignore the cache directive?
With max-age=0, stale-while-revalidate
, I think it's true. The first load will serve from cache. During the ~dozen seconds you spend looking at the page, the browser will refresh the cache from origin, so by the time you reload there should be a fresh copy in cache.
it shouldn't be terribly difficult to run a fork of docs.rs somewhere and add Access-Control-Allow-Origin there.
Wouldn't it require a lot of CPU and storage to store all the crates? I'm thinking of something that would exist for a period of months, where we'd invite testers to try using it as their daily driver version of docs.rs, to see what weird cases would come out of real-life browsing patterns.
During the ~dozen seconds you spend looking at the page, the browser will refresh the cache from origin, so by the time you reload there should be a fresh copy in cache.
Ahh, that makes sense, I didn't realize that's what the directive did.
Wouldn't it require a lot of CPU and storage to store all the crates? I'm thinking of something that would exist for a period of months, where we'd invite testers to try using it as their daily driver version of docs.rs, to see what weird cases would come out of real-life browsing patterns.
I don't see a realistic way to do this. Either we experiment with it in prod (maybe with a feature flag?) or we can write more tests; it's just not feasible to replicate docs.rs at scale.
I admit I never worked with this kind of frontend caching, but I'm excited to see it if it works.
Since caching is hard this feels like that there might be edge-cases with confusing mixtures of cached and uncached pages (and assets), so IMHO having a (even user-visible) feature flag / testing phase would be a great idea.
Or building a second setup. I mean, having a staging platform is not a terrible idea :)
Yes, I definitely want to set up a staging server at some point where people can try things out interactively. I just want to set reasonable expectations for it; it's going to end up like staging.crates.io where maybe 5 people a week visit, it won't let us see problems that only appear at scale.
I just tested stale-while-revalidate, and it does make the page nicely available when the network is offline, at least in Chrome.
Proposal: Let's add Cache-Control: max-age=0, stale-while-revalidate=N
for all versioned URLs, but not yet for /latest/
URLs (#1527) since things are a little trickier there. I propose N = 2 months to start.
Sounds like a plan! :)
A little hiccup: Iron doesn't seem to support stale-while-revalidate, and doesn't allow setting custom strings for the cache-control header: https://docs.rs/iron/0.6.1/iron/headers/enum.CacheDirective.html
@jsha does Extension
not support custom headers?
Anyway, iron hasn't had a publish in 3 years, I wouldn't get your hopes too high. @syphar has been working on and off on switching to Axum.
note that the axum migration is done for some time.
It'd be great to turn docs.rs into an offline-first PWA (Progressive Web App). So the user would still be able to browse the docs they have already visited before even when offline, without having to use a separate website or app.
The same could be done for doc.rust-lang.org.
Originally posted by @teohhanhui in https://github.com/rust-lang/docs.rs/issues/174#issuecomment-647121395