rust-lang / promote-release

Tooling to publish Rust releases.
Apache License 2.0
18 stars 13 forks source link

Handling redirects in published HTML pages #16

Open ScottAbbey opened 7 years ago

ScottAbbey commented 7 years ago

Summary

I propose that we create a file (or files) listing redirects, then use the --website-redirect option of the AWS CLI tool within the promote-release tool to publish these as 301 Moved Permanently redirects to the appropriate locations.

Background

There are now a significant number of links on the web that point to what are essentially "This page has moved" pages on https://doc.rust-lang.org.

Multiple issues have been raised regarding these pages, including https://github.com/rust-lang/rust/issues/42632. Refer to that issue for specifics on some of the ways these redirect pages are creating problems. In short, both search engines and users clicking links are finding themselves on pages that don't need to exist.

Proposal

For pages that amount to no more than a "go here instead" message, we should consider serving 301 Moved Permanently responses when there is a definitive target.

For example, https://doc.rust-lang.org/tutorial.html should just redirect to the link it highlights.

A large collection of pages from the first edition of the book, like https://doc.rust-lang.org/book/enums.html, should just redirect to the new URL for the first edition, like https://doc.rust-lang.org/book/first-edition/enums.html. (Those pages also highlight the existence of a second edition, which can alternatively be highlighted from the actual result pages.)

Anything previously under a /stable/ link that has now moved would be a good candidate as well.

I assume there are other opportunities for redirects that I have not listed here.

Plan

From what I can tell, the publish_docs function here is sending all these pages to https://doc.rust-lang.org. I assume there is a CDN layer in front of that, as well.

Several of the AWS CLI commands (like cp, here) have a --website-redirect option that can be used to attach metadata indicating that requests for an object should be served with a 301 Moved Permanently response.

In order to call this command, we will need to have a list of individual pages that need to be redirected, and the new location for each.

Questions

Ref: https://github.com/rust-lang/book/issues/760

alexcrichton commented 7 years ago

Thanks for opening the issue, sounds like a great idea to me!

Is using 301 redirects on these pages an appropriate solution?

Sure!

Will a 301 redirect in S3 work properly with the CDN that is presumably in front of it?

AFAIK yeah, but I don't think we've tested it out before. May be worth googling around and seeing if cloudfront is compatible with S3 website redirects.

Should this redirect listing reside in rust-lang/rust-central-station, or should there be one in each of the various projects that end up in rust-docs\share\doc\rust\html?

The listing should reside in rust-lang/rust, not rust-central-station. Basically the listing will have to make its way to the published tarballs.

What is a good format to use for a file listing redirects?

Up to you!

RalfJung commented 7 years ago

I'd be very careful emitting "301 moved permanently". There is no upper limit to how long browsers and search engines remember such a redirect, which effectively means such a redirect, once in place, can never ever be changed again. You cannot make it point somewhere else, and you cannot turn it back to a normal (code 200) website. 302 or 307 would IMHO be a better status code to use.

ScottAbbey commented 7 years ago

There is no upper limit to how long browsers and search engines remember such a redirect, which effectively means such a redirect, once in place, can never ever be changed again.

A 3xx response could be sent with Cache-Control: max-age=x headers that clients should respect. If these 301 redirects are only ever sent with a reasonable value in a header like this, it would remain possible to begin serving either old or new content from the same URL in the future.

After testing this out with a S3 bucket, it appears that setting the object metadata for a redirect does send a 301 response, but it does not include any other headers such as Cache-Control that have been set on the same object. That's unfortunate.


302 or 307 would IMHO be a better status code to use.

Hmm.

Whatever 3xx code would be most appropriate, I originally suggested 301 as it is the only type of redirect that S3 can send directly. If we wanted to use a different type of redirect, it would have to come at a layer in front of S3.

The two motivations for adding redirects are search engine optimization and end user experience. As far as I understand, both temporary and permanent redirects should be favorably treated by most search engines today, and end users won't notice any difference unless they are stuck in a 301 trap.

Semantically, the content has probably moved permanently, considering it has been in a new location for almost 3 years now.


After some further investigation, it appears that the entire doc.rust-lang.org site is actually proxied through some other machine running nginx in front of the S3 bucket, rather than CloudFront like www.rust-lang.org is. So any redirects would also need to pass properly through that proxy.

Considering this along with the issue I noted above about S3's 301 redirects not including Cache-Control headers, it seems like getting redirects working would require many changes beyond just those in the publish_docs function I noted above.

The search engine optimization goal can probably be handled just as well by adding Link: ...; rel=canonical HTTP headers or <link rel="canonical" href="..."> HTML elements. Adding a javascript redirect as suggested in some linked threads would handle most end user issues.

RalfJung commented 7 years ago

A <meta http-equiv="refresh" ...> could also be used to redirect without needing javascript, and probably understood by search engines.

tomprince commented 7 years ago

Even if the plan outlined above won't work, we should still address the issue of handling redirects better.