Open ColinFay opened 3 years ago
Chiming in here that Hugo has a way to do this, which might provide an opportunity for some inspiration if per-chapter YAMLs in bookdown were a possibility?
See: https://gohugo.io/content-management/urls/#aliases
For example:
---
aliases:
- /posts/my-original-url/
- /2010/01/01/even-earlier-url.html
---
Thanks Alison ! This is very interesting. I was thinking at first of a site-wide configuration (like in _bookdown.yml
) with a list of old <-> new names, a bit like _redirects
in netlify.
The idea of having it per-document could be interesting. But there are limitations with bookdown:
split_by=
methods.
But this would work well with bs4_book()
which uses one Rmd = one chapter by design. The site-wide approach would have the advantage not to be tied with the spliting of the HTML. The list of changed filename leading to change in URL would cover all cases.
With Hugo, a post is a document with a URL. So it makes complete sense. I wonder if we could consider for bookdown the same - should a url point to a chapter ? a section ? a sub-chapter ? I am not sure there is one answer. But the choice of 1 Rmd = 1 chapter = 1 html file simplifies things for sure.
Also, I like the name aliases
used in Hugo.
Hey,
This was also my first thought, that it would be nice to have it in the YAML, something like:
redirects:
chapter-lorem: lorem
chapter-ipsum: ipsum
And then after the book compilation, somethign like :
redirects <- yaml::read_yaml("_output.yml")$redirects
make_redirect <- function(from, to){
html <- sprintf(
'<head><meta http-equiv="refresh" content="0; URL=%s.html" /></head>',
to
)
dest <- fs::path("_book", from, ext = "html")
fs::file_create(dest)
write(html, dest)
}
mapply(make_redirect, from = names(redirects), to = redirects)
Let me know what you think of this, happy to make a PR :)
Hey y'all,
If ever you want to see a "real life example", I've used it for the Engineering Shiny book.
Step 1: Specified all the redirect here
https://github.com/ThinkR-open/engineering-shiny-book/blob/master/_output.yml#L37
Step 2: did the change in the chapter to have new URL
Step 3: Add a redirect script that is launched by CI
https://github.com/ThinkR-open/engineering-shiny-book/blob/master/redirect.R
Result:
https://github.com/ThinkR-open/engineering-shiny-book/blob/gh-pages/deploy-golem.html
https://engineering-shiny.org/deploy.html
Thanks for sharing !
I would also do it that way in the general idea. :)
But maybe produce full HTML document, like the Hugo example.
<!DOCTYPE html>
<html>
<head>
<link rel="canonical" href="https://example.com/posts/my-intended-url"/>
<meta name="robots" content="noindex">
<meta http-equiv="content-type" content="text/html; charset=utf-8"/>
<meta http-equiv="refresh" content="0; url=https://example.com/new-url"/>
</head>
</html>
nodindex
is useful to avoid indexing of this page (https://developers.google.com/search/reference/robots_meta_tag)
link rel="canonical"
seems also useful to avoid duplicate URL but it seems that is does not add value with the previous. Hugo do both maybe to be sure.
Having <!DOCTYPE html>
and <html>
so that we have a full webpage. (I know that this works without them too)
Also, I think this would be better to have in _bookdown.yaml
not _output.yaml
because it would be a feature of bookdown, and because _output.yaml
should contain only output formats definition (it is parsed that way by rmarkdown). This is not the place for other type of fields.
This could also be a feature of only HTML books format as it makes no sense for other type of books, so it could be under bookdown::gitbook()
. This would allow to pass a list of page name changed as a parameter in gitbook()
function.
Thoughts ?
Indeed, that would be better with a full HTML.
Using the rel = "canonical"
would also be a good choice, but I think the noindex should not be put because we would still want Google to know that this old URL is still available? I mean, instead of sending the message of a page that no longer exists (i.e it has been indexed before but no it's no longer there), we keep this url available,
Another read on this: https://developers.google.com/search/blog/2012/08/website-testing-google-search
We recommend usingrel="canonical" rather than a noindex meta tag because it more closely matches your intent in this situation. Let's say you were testing variations of your homepage; you don't want search engines to not index your homepage, you just want them to understand that all the test URLs are close duplicates or variations on the original URL and should be grouped as such, with the original URL as the canonical. Using noindex rather thanrel="canonical"in such a situation can sometimes have unexpected effects (e.g., if for some reason we choose one of the variant URLs as the canonical, the "original" URL might also get dropped from the index since it would get treated as a duplicate).
Regarding where to put this info, this would indeed make sense to have it somewhere else, but you might be best suited to know where it fits exactly :)
Oh thanks for sharing this resource. This may be why Hugo puts both of them.
It seems also that regarding SEO, client side redirect with meta tag and refresh 0 will be seen as 301 redirect by search engine https://www.contentkingapp.com/academy/redirects/#client-side-redirects And 301 redirect will have the same effect than canonical url: the search engine will go to the correct page https://developers.google.com/search/docs/advanced/crawling/consolidate-duplicate-urls#redirects-method
However, I still think the 'noindex' could be useful, because you don't really want this old url to still be considered as a potential way to access the page. It should no more appear in search engine result because you want the new url to be indexed. I see this as a case of 301 redirect for pages that no more exist to make the transition smooth, but if no one uses the old url, it is better. So I think I want to send the message to Google that this old url no longer exist 😅
Maybe we should make all this configurable so that anyone can do as preferred.
Other resource for reference to understand the 3 ways: https://overthinkgroup.com/301-redirect-noindex-rel-canonical/
Related feature request in pkgdown : https://github.com/r-lib/pkgdown/pull/1259
This will be added there before by @maelle https://github.com/r-lib/pkgdown/pull/1639
We'll need to have a look there since sharing a common syntax in YAML would be better.
Did this ever get implemented in Bookdown? If so are there examples of its implementation?
Hi @andybeet
It was not yet implemented in bookdown. Only example I know is the one shared here: https://github.com/rstudio/bookdown/issues/1071#issuecomment-773903750
showing the idea of it.
Thanks @cderv. I'll implement this example for now. Or maybe i'll just make a custom 404 until a solution is implemented. The pkgdown implementation works great to handle this. I don't fully understand the bookdown/pkgdown differences or how complex it would be to implement but it would be a great addition.
Is there a plan/timeline for adding this or has it been shelved?
Is there a plan/timeline for adding this or has it been shelved?
Currently there is no timeline on this. Our focus is not on new bookdown features for now. We are happy to review PR though and help how we can.
Context
I've started a book a couple of months ago, with a series of Rmd, let's say named
chapter-lorem.Rmd
,chapter-ipsum.Rmd
.This had generated a book, hosted on GitHub pages, at
myuberbook.org/chapter-lorem.hml
andmyuberbook.org/chapter-ipsum.html
.My book has been read and shared on the internet, and potentially people have been sharing links to
myuberbook.org/chapter-lorem.hml
, but now I want to rename it to bemyuberbook.org/lorem.hml
.Feature Request
It would be nice to have a native mechanism to do these redirects, in other words having a way to change the chapter URLs, while still keeping the old URL active with a redirect.
A native redirect would allow to:
My current approach is to, after the book compilation, write a series of HTML files with in it :
using
--
By filing an issue to this repo, I promise that
xfun::session_info('bookdown')
. I have upgraded all my packages to their latest versions (e.g., R, RStudio, and R packages), and also tried the development version:remotes::install_github('rstudio/bookdown')
.I understand that my issue may be closed if I don't fulfill my promises.