rstudio / bookdown

Authoring Books and Technical Documents with R Markdown
https://pkgs.rstudio.com/bookdown/
GNU General Public License v3.0
3.78k stars 1.27k forks source link

[FR] Managing redirect #1071

Open ColinFay opened 3 years ago

ColinFay commented 3 years ago

Context

I've started a book a couple of months ago, with a series of Rmd, let's say named chapter-lorem.Rmd, chapter-ipsum.Rmd.

This had generated a book, hosted on GitHub pages, at myuberbook.org/chapter-lorem.hml and myuberbook.org/chapter-ipsum.html.

My book has been read and shared on the internet, and potentially people have been sharing links to myuberbook.org/chapter-lorem.hml, but now I want to rename it to be myuberbook.org/lorem.hml.

Feature Request

It would be nice to have a native mechanism to do these redirects, in other words having a way to change the chapter URLs, while still keeping the old URL active with a redirect.

A native redirect would allow to:

My current approach is to, after the book compilation, write a series of HTML files with in it :

<head>
  <meta http-equiv="refresh" content="0; URL=lorem.hml" />
</head>

using

make_redirect <- function(from, to){
  html <- sprintf(
    '<head><meta http-equiv="refresh" content="0; URL=%s.html" /></head>', 
    to
  )
  dest <- fs::path("_book", from, ext = "html")
  fs::file_create(dest)
  write(html, dest)
}
make_redirect("chapter-lorem", "lorem")

--

By filing an issue to this repo, I promise that

I understand that my issue may be closed if I don't fulfill my promises.

apreshill commented 3 years ago

Chiming in here that Hugo has a way to do this, which might provide an opportunity for some inspiration if per-chapter YAMLs in bookdown were a possibility?

See: https://gohugo.io/content-management/urls/#aliases

For example:

---
aliases:
    - /posts/my-original-url/
    - /2010/01/01/even-earlier-url.html
---
cderv commented 3 years ago

Thanks Alison ! This is very interesting. I was thinking at first of a site-wide configuration (like in _bookdown.yml) with a list of old <-> new names, a bit like _redirects in netlify.

The idea of having it per-document could be interesting. But there are limitations with bookdown:

The site-wide approach would have the advantage not to be tied with the spliting of the HTML. The list of changed filename leading to change in URL would cover all cases.

With Hugo, a post is a document with a URL. So it makes complete sense. I wonder if we could consider for bookdown the same - should a url point to a chapter ? a section ? a sub-chapter ? I am not sure there is one answer. But the choice of 1 Rmd = 1 chapter = 1 html file simplifies things for sure.

Also, I like the name aliases used in Hugo.

ColinFay commented 3 years ago

Hey,

This was also my first thought, that it would be nice to have it in the YAML, something like:

redirects:
  chapter-lorem: lorem
  chapter-ipsum: ipsum

And then after the book compilation, somethign like :

redirects <- yaml::read_yaml("_output.yml")$redirects

make_redirect <- function(from, to){
  html <- sprintf(
    '<head><meta http-equiv="refresh" content="0; URL=%s.html" /></head>', 
    to
  )
  dest <- fs::path("_book", from, ext = "html")
  fs::file_create(dest)
  write(html, dest)
}

mapply(make_redirect, from = names(redirects), to = redirects)

Let me know what you think of this, happy to make a PR :)

ColinFay commented 3 years ago

Hey y'all,

If ever you want to see a "real life example", I've used it for the Engineering Shiny book.

Step 1: Specified all the redirect here

https://github.com/ThinkR-open/engineering-shiny-book/blob/master/_output.yml#L37

Step 2: did the change in the chapter to have new URL

Step 3: Add a redirect script that is launched by CI

https://github.com/ThinkR-open/engineering-shiny-book/blob/master/.github/workflows/deploy_bookdown.yml#L63

https://github.com/ThinkR-open/engineering-shiny-book/blob/master/redirect.R

Result:

https://github.com/ThinkR-open/engineering-shiny-book/blob/gh-pages/deploy-golem.html

https://engineering-shiny.org/deploy.html

https://engineering-shiny.org/deploy-golem.html

cderv commented 3 years ago

Thanks for sharing !

I would also do it that way in the general idea. :)

But maybe produce full HTML document, like the Hugo example.

<!DOCTYPE html>
<html>
  <head>
    <link rel="canonical" href="https://example.com/posts/my-intended-url"/>
    <meta name="robots" content="noindex">
    <meta http-equiv="content-type" content="text/html; charset=utf-8"/>
    <meta http-equiv="refresh" content="0; url=https://example.com/new-url"/>
  </head>
</html>

nodindex is useful to avoid indexing of this page (https://developers.google.com/search/reference/robots_meta_tag)

link rel="canonical" seems also useful to avoid duplicate URL but it seems that is does not add value with the previous. Hugo do both maybe to be sure.

Having <!DOCTYPE html> and <html> so that we have a full webpage. (I know that this works without them too)

Also, I think this would be better to have in _bookdown.yaml not _output.yaml because it would be a feature of bookdown, and because _output.yaml should contain only output formats definition (it is parsed that way by rmarkdown). This is not the place for other type of fields.

This could also be a feature of only HTML books format as it makes no sense for other type of books, so it could be under bookdown::gitbook(). This would allow to pass a list of page name changed as a parameter in gitbook() function.

Thoughts ?

ColinFay commented 3 years ago

Indeed, that would be better with a full HTML.

Using the rel = "canonical" would also be a good choice, but I think the noindex should not be put because we would still want Google to know that this old URL is still available? I mean, instead of sending the message of a page that no longer exists (i.e it has been indexed before but no it's no longer there), we keep this url available,

Another read on this: https://developers.google.com/search/blog/2012/08/website-testing-google-search

We recommend usingrel="canonical" rather than a noindex meta tag because it more closely matches your intent in this situation. Let's say you were testing variations of your homepage; you don't want search engines to not index your homepage, you just want them to understand that all the test URLs are close duplicates or variations on the original URL and should be grouped as such, with the original URL as the canonical. Using noindex rather thanrel="canonical"in such a situation can sometimes have unexpected effects (e.g., if for some reason we choose one of the variant URLs as the canonical, the "original" URL might also get dropped from the index since it would get treated as a duplicate).

Regarding where to put this info, this would indeed make sense to have it somewhere else, but you might be best suited to know where it fits exactly :)

cderv commented 3 years ago

Oh thanks for sharing this resource. This may be why Hugo puts both of them.

It seems also that regarding SEO, client side redirect with meta tag and refresh 0 will be seen as 301 redirect by search engine https://www.contentkingapp.com/academy/redirects/#client-side-redirects And 301 redirect will have the same effect than canonical url: the search engine will go to the correct page https://developers.google.com/search/docs/advanced/crawling/consolidate-duplicate-urls#redirects-method

However, I still think the 'noindex' could be useful, because you don't really want this old url to still be considered as a potential way to access the page. It should no more appear in search engine result because you want the new url to be indexed. I see this as a case of 301 redirect for pages that no more exist to make the transition smooth, but if no one uses the old url, it is better. So I think I want to send the message to Google that this old url no longer exist 😅

Maybe we should make all this configurable so that anyone can do as preferred.

cderv commented 3 years ago

Other resource for reference to understand the 3 ways: https://overthinkgroup.com/301-redirect-noindex-rel-canonical/

cderv commented 3 years ago

Related feature request in pkgdown : https://github.com/r-lib/pkgdown/pull/1259

This will be added there before by @maelle https://github.com/r-lib/pkgdown/pull/1639

We'll need to have a look there since sharing a common syntax in YAML would be better.

andybeet commented 8 months ago

Did this ever get implemented in Bookdown? If so are there examples of its implementation?

cderv commented 8 months ago

Hi @andybeet

It was not yet implemented in bookdown. Only example I know is the one shared here: https://github.com/rstudio/bookdown/issues/1071#issuecomment-773903750

showing the idea of it.

andybeet commented 8 months ago

Thanks @cderv. I'll implement this example for now. Or maybe i'll just make a custom 404 until a solution is implemented. The pkgdown implementation works great to handle this. I don't fully understand the bookdown/pkgdown differences or how complex it would be to implement but it would be a great addition.

Is there a plan/timeline for adding this or has it been shelved?

cderv commented 8 months ago

Is there a plan/timeline for adding this or has it been shelved?

Currently there is no timeline on this. Our focus is not on new bookdown features for now. We are happy to review PR though and help how we can.