rstudio / blogdown

Create Blogs and Websites with R Markdown
https://pkgs.rstudio.com/blogdown/
1.74k stars 330 forks source link

Custom Output Format for Content #289

Open 1danjordan opened 6 years ago

1danjordan commented 6 years ago

Hi Yihui,

I want to use a custom output format when rendering posts in blogdown. In particular, I want to use the tufte output format tufte::tufte_html when rendering blogposts. I'm aware that blogdown renders .Rmd files with blogdown::html_page, I can see this in the render_page script.

out = rmarkdown::render(
    input, 'blogdown::html_page', envir = globalenv(), quiet = TRUE,
    encoding = 'UTF-8', run_pandoc = !to_md, clean = !to_md
)

I'm also aware that to render arbitrary .Rmd files we can specify a build.R file to render them using the arguments in the YAML front matter, however this only seems to work when building files under the static directory. When setting build.R to blogdown::build_dir('content') blog posts are not rendered using tufte::tufte_html as specified in it's front matter.

I assume it is a design choice that posts are only rendered by blogdown::html_page. Is it possible to specify the output format of a post in content/post/ in blogdown currently?

Otherwise I think this feature could be implemented by simply passing the specified YAML arguments to rmarkdown::render. Could I help with this feature?

Thanks, Dan

1danjordan commented 6 years ago

Hi,

So after some more investigation I realise that I don't in fact really want to use a different output format, as the html_page output format is designed to output an incomplete html page which is then passed to Pandoc and finally into Hugo templates. So really, what I want to do is use the post processor in tufte_html, and pass that into html_page in the post_processor argument in order to to alter the html appropriately to match tufte.css classes.

Going through the code, it looks like the post processor would work for footnotes and captions but not with plots or the {marginfigure} knitr chunk. I assume it would be possible to put these in the post processor as well given the post processor is passed everything anyway. So really my question is then, how should we define post processor function in order to pass it into html_page? Does it need to be put into a package, or could we define it in the /R/ folder of the root directory?

Thanks, Dan

yihui commented 6 years ago

Yes, your understanding was completely correct. Currently the only way for you to post-process the .html files is /R/build.R.

1danjordan commented 6 years ago

Hey Yihui,

I've put together a build.R file to build blogdown posts using an altered tufte output format. It's very hacky, but has helped me build my understanding of how things work. Nearly all features are working, apart from figure captions. I just need some time to fiddle with them.

You can see it in this commit https://github.com/dandermotj/hugo-envisioned/commit/267be34e6f88535fdc27269d59c77dba8e7ba5e0.

Once I iron this out, I'll see if I can add blogdown support to take an output format in the frontmatter rather than in the build.R step of the site build.

Thanks for your help! Dan

1danjordan commented 5 years ago

Hi Yihui,

I'm planning on making a pull request to allow blogdown users specify an output format other than blogdown::html_page. This is motivated by my own use case as we discussed above.

Firstly, I'm wondering would this be a useful pull request to make? And secondly, if you thought it was, what would be the best implementation. I had two possibilities in mind. One option would be for blogdown to use the output format specified in _output.yml or the YAML frontmatter, making any output format available. This could potentially cause a lot of confusion/disruption for users if they didn't know what they were doing. The second option would be to add a tufte_html output format to format.R and only give users the output formats included in blogdown. The default option would continue to be html_page unless the users specified otherwise (either through options(blogdown.output_format = "blogdown::tufte_html") or through output.yml. This might be the safer and more conservative option?

Thanks, Dan

asifm commented 5 years ago

@dandermotj Any update on this? I was looking for the exact same thing and was happy to finally find this issue. I don't see any related commit. So I'm guessing it's not yet implemented. Any suggestion about how I should go about this for the time being? And, thanks for solving this.

tcgriffith commented 5 years ago

The problem of using other rmd output formats in content/ is that these posts will be under the effects of both the self-contained CSS styles AND the CSS styles of the website, and the final appearance of the blog posts might not be what you want.

There are two ways to post a tufte-style blog on your blog site:

Hope this helps.

1danjordan commented 5 years ago

@tcgriffith you're correct that if you render an Rmd to HTML and just drop it into the content/ folder then it won't work for various reasons. The solution is to write a Tufte output format function and call that when rendering the Rmd. This is actually quite simple - you can see my own dirty hack for doing this here using a custom blogdown build script. The tufte_hugo_html function is essentially the tufte::tufte_html with some small changes.

blogdown has blogdown::html_page hardcoded as the only output format available. This is a sensible decision because it ensures that the rendered Rmd always plays nice with Hugo. My suggestion is exposing some argument that would allow users to specify a custom output format. It's just unclear what the best API would be for this. I have some thoughts but would like to know what @yihui thinks himself.

1danjordan commented 5 years ago

@asifm if you're looking for something that works but is a total hack, you can use the custom build script in my version of the Hugo Tufte theme - Hugo Envisioned.

asifm commented 5 years ago

@daijiang This is fantastic. I was able to make it work with nstrayer/tuftesque theme. Thanks very much.

b4D8 commented 4 years ago

Hi! I'm trying to use Tufte's sidenotes/marginnotes with blogdown.

I already use some homemade shortcodes which are working great for my blog posts but I'd like the layout (correctly populated label, input and span html elements) to be created for my code chunk's output.

I tried this custom build and it does work but wouldn't it be possible to just add support for fig.margin = TRUE and fig.fullwidth = TRUE in the blogdown::html_page output?

b4D8 commented 3 years ago

So I've been using this great build.R script of @1danjordan for a while, with blogdown.method = "html", as it seemed to be the best way for me to integrate Tufte CSS and blogdown with my limited knowledge of R.

Unfortunately it seems that the release of v1.0 broke the script, as I now get the following error:

Error in output_file(f, to_md <- is_rmarkdown(f)) : 
  unused argument (to_md <- is_rmarkdown(f))
Calls: build_rmds
Execution halted
Error in run_script("R/build.R", as.character(local)) : 
  Failed to run R/build.R

I know it's not really a blogdown issue but it looks like this script isn't maintained anymore, so maybe someone found a way to fix the script? Any help much appreciated.

Anyways, I hope that in a future release, blogdown will provide a more robust integration for Tufte which I find to be a really convenient way to structure code outputs.

cderv commented 3 years ago

This error comes from the fact that the script you linked to uses all the unexported function from blogdown

One of them is blogdown:::output_file which now only take a file argument and no more a to_md.

diff --git a/R/utils.R b/R/utils.R
index 049e5eb..eaf4912 100644
--- a/R/utils.R
+++ b/R/utils.R
@@ -195,11 +195,13 @@ is_64bit = function() {
   length(grep('64', unlist(Sys.info()[c('machine', 'release')]))) > 0
 }

-is_rmarkdown = function(x) grepl('[.][Rr]markdown$', x)
-
-# build .Rmarkdown to .markdown, and .Rmd to .html
-output_file = function(file, md = is_rmarkdown(file)) {
-  with_ext(file, ifelse(md, 'markdown', 'html'))
+# build .Rmarkdown to .markdown, and .Rmd to .html unless the global option
+# blogdown.method = 'markdown'
+output_file = function(file) {
+  ext = if (build_method() == 'markdown') 'md' else 'html'
+  ext = rep(ext, length(file))
+  ext[grep('[.][Rr]markdown$', file)] = 'markdown'
+  with_ext(file, ext)
 }

This argument can now be removed from output_file() in the script. I believe this should work as before in your script.

It is always dangerous to use internal and unexported functions from a package. Things could brake as it is.

b4D8 commented 3 years ago

I know this implementation is fragile but well I have no idea how to do it any better...

I removed the to_md and got the following error because it's been used later in the script:

Error in rmarkdown::render(f, tufte_html_page(), envir = globalenv(),  : 
  object 'to_md' not found
Calls: build_rmds -> <Anonymous>
Execution halted
Error in run_script("R/build.R", as.character(local)) : 
  Failed to run R/build.R

So I tried to recreate it like so:

is_rmarkdown = function(x) grepl('[.][Rr]markdown$', x)
to_md <- is_rmarkdown(f)

placed right before the output_file().

Now the files are being rendered well but then I have another error with bundle_index:

Error in bundle_index(output) : 
  missing argument "output" with no default value
Calls: build_rmds -> encode_paths -> bundle_index -> basename
Execution halted
Error in run_script("R/build.R", as.character(local)) : 
  Failed to run R/build.R

All my posts are indeed in page bundles...

Thanks a lot for your assistance!

cderv commented 3 years ago

This another internal function that has changed

@@ -217,9 +217,9 @@ process_markdown = function(x, res) {
 # are used extensively in a website)

 # example values of arguments: x = <html> code; deps = '2017-02-14-foo_files';
-# parent = 'content/post';
-encode_paths = function(x, deps, parent, base = '/', to_md = FALSE) {
-  if (basename(deps) == 'index_files' || !dir_exists(deps)) return(x)
+# parent = 'content/post'; output = 'content/post/hello.md'
+encode_paths = function(x, deps, parent, base = '/', to_md = FALSE, output) {
+  if (!dir_exists(deps)) return(x)  # no external dependencies such as images
   if (!grepl('/$', parent)) parent = paste0(parent, '/')
   deps = basename(deps)
   need_encode = !to_md
@@ -231,6 +231,16 @@ encode_paths = function(x, deps, parent, base = '/', to_md = FALSE) {
   # find the dependencies referenced in HTML
   r = paste0('(<img src|<script src|<link href)(=")(', deps, '/)')

Your script is using encode_pathso you'll need to modify this. (this function calls bundle_index)

I believe the script of @1danjordan is based on internal function build_rmds that it is trying to reproduce. This function has slightly change.

There is currently no a simple way to replace the html_page() format so it will require for now adjusting the script.

1danjordan commented 3 years ago

Hi @b4D8,

I haven't used blogdown in a while, so won't be updating that script until I hopefully get back to writing at some point. If you come up with a solution please let me know!

Thanks, Dan

b4D8 commented 3 years ago

Thanks @cderv for your answer, so I tried to investigate but finally gave up on this and preferred to downgrade as I find this rendering actually more convenient than the new features. Hope blogdown will provide a built-in integration for the Tufte Hangout in the future. Meanwhile, thanks again @1danjordan, your script is dope to me!

yihui commented 3 years ago

Hope blogdown will provide a built-in integration for the Tufte Hangout in the future.

We'll probably redesign the tufte package in the future. As the first step, we'll provide the hugo-prose theme: https://github.com/yihui/hugo-prose which will cover most features of the tufte CSS (notably, margin notes and full-width content).

b4D8 commented 3 years ago

Thanks @yihui! I'm going to give the theme a close look! As far as I understand it the integration seems to be based on pandoc div fenced. I was thinking of a deeper level of integration through knitr or code chunk options like fig.margin = TRUE and fig.fullwidth = TRUE because last time I tried I couldn't find a way to pass a class to my figure or table outputs without pandoc with class.output, class.source or out.extra. Another option for that purpose of course would be to even more convenient as I can take care of CSS from there :) Anyways, I love blogdown! Thanks for developping it!

yihui commented 3 years ago

We could definitely make chunk options work. The theme is still in an early stage of development. Using fenced Div's is a more general approach since you can put arbitrary content (not limited to figures) in the margin or make it full-width. We can refine this approach in the future. Thanks for the suggestion!

mivalek commented 3 years ago

Hi all,


EDIT: Actually, this was easier than I thought. Here's the implementation for optional custom formats https://github.com/rstudio/blogdown/commit/a44ff2bd0bac506b7aa78faf6cdfa64929642234


I have a very closely related issue so don't wnat to open a new one, even though the approach I'd suggest is different. A tiny bit of background: I am a uni teacher so hosting both slides and lecture handouts/worksheets in an integrated easy-to-use, easy-to-maintain way is important when creating a course website. Afer quite a bit of hacking, I managed to create a website with two output formats (xaringan slides and distill-like pages). I then realised that enabling users to define custom formats while keeping blogdown::html_page as the default should be fairly simple. Or at least I think so.

Here is my suggestion: config.yaml should have an optional renders: parameter, such as:

renders:
  default: "blogdown::html_page"
  slides: "my_pkg::my_format"

blogdown::build_site() then reads renders from config.yaml and passes it down to blogdown:::build_one() which looks for the type: parameter in a .Rmd file's YAML front matter. If there isn't one, the file gets rendered by the default method. In this scenario, files with type: slides in YAML get rendered with my_pkg::my_format. Using the type: param is handy because it allows custom HUGO templates for files of a given type.

If there is no renders: in config.yaml, everything gets rendered by blogdown::html_page so it would not be a breaking change.

@yihui is this something you would consider adding? If so, I would be happy to work on it and create a pull request.

yihui commented 2 years ago

@mivalek Do you have an example repo and website to show me what the xaringan slides and distill pages look like? I want to understand your need better as well as what you have done in the past before discussing the possible technical implementation. Thanks!

To build an Rmd document to any type of page that is not a normal Hugo page, the recommended approach has always been to use the static/ folder: https://bookdown.org/yihui/blogdown/static-files.html But I'm open to suggestions.

mivalek commented 2 years ago

@yihui Here's one I just made and here's the deployed site with slides and distill-like docs . It's a pared down version of a website I am using for my module.

From my perspective, the main advantage of supporting multiple output formats is the ability to build them all using Hugo templating and have them treated as normal pages (listing pages, tags, etc...). Using static/ for a page like the one linked to above would be very cumbersome. I'm not even sure it would work.

If I'm correct and my above-proposed edits to blogdown are non-breaking, I think it's really a win-win.

yihui commented 2 years ago

Basically this whole thread is about a feature request to remove the hard-coded blogdown::html_page format here and allow users to specify their own output formats: https://github.com/rstudio/blogdown/blob/2bc20aad1aea19381d09d5f3fbc345fd9016b180/R/render.R#L216

I can definitely do that. Previously I forced the output format mainly for the consideration of users who are not familiar with Hugo---you can't really use an arbitrary output format (e.g., html_document or pdf_document) and expect it to work magically. If you understand Hugo templates well (@mivalek you certainly do), I'd be happy to remove this constraint.

Regarding your implementation a44ff2bd0bac506b7aa78faf6cdfa64929642234, you are correct that it's non-breaking. What I'm thinking right now is a potentially breaking change. That is, build_one() just uses whatever output format specified in the output field in the post. I can warn against a few output formats that are known to be not okay.

It will make the implementation slightly simpler, but you would need to specify both output: teachR::teachR::xaringan_slides (for blogdown) and type: slides (for Hugo) in YAML, which you might not like.

mivalek commented 2 years ago

Thanks for this @yihui, that's great! I completely get the reason why you decided to force output format. That's why I think that my implementation works as the renders: argument to config.yaml is optional. I also like it because it's easy to use. I understand, however, that you don't necessarily share that concern 🙂. I think the changes were quite easy to implement so I'd be happy to open a PR for you (and also take care of config.toml). I would also be up for documenting the usage in the blogdown guide.

Out of curiosity, what is the reason you're leaning towards the breaking change you mentioned? (you're right, I'm not a huge fan of having to specify output in every document's YAML but... it's your package 😉)

yihui commented 2 years ago

I'm not sure if I should use Hugo's _config.yml for blogdown. Currently, pretty much all blogdown settings are in options() in .Rprofile. There are definitely other possibilities to store the configurations, such as environment variables, or a dedicated config file like _blogdown.yml, or Hugo's config.yaml. They all have their pros and cons. Since we have invested in options() for long, I tend not to change the way.

The renders option can be set in options() in .Rprofile, too. I guess it's no big deal to you.

Actually, your implementation is mostly acceptable to me. I need to think more about it.

Back to your original feature request of being able to create xaringan slides under the content/ directory, I definitely agree with you on the advantages compared to using static/. I have hoped for long to provide an example that uses Hugo's .RawContent variable to create slides with remark.js. Basically, that means you generate .md output (instead of .html) from .Rmd, which can be done using either options(blogdown.method = 'markdown') or the .Rmarkdown extension. Have you considered this way?

mivalek commented 2 years ago

I understand your motivation to keep config.yaml free of blogdown options. Ultimately, any approach that allows me to (easily) set up something akin to what I have and make it user-friendly so that people don't need to be expert blogdown/Hugo users in order to use my theme works for me.

The same goes for my feature request to have slides in content/. I think my approach is elegant in that both (all) document formats are kind of the same - .Rmd files knitted to html. I'm not wedded to this in any way so if .Rmarkdown -> .md works better for some reason, I'm happy using it. Basically, I care about consistency in use: so long as all output formats can be generated using the same system, I don't particularly mind which of the two it is.

To answer your question, I remember considering using .Rmarkdown early on but decided against. I am not sure I remember why but I think it was mainly due to the fact that I didn't like the Hugo Distill theme output as much as Distill for R Markdown and I also had a lot of legacy code that worked better with the latter. My memory of it is a bit hazy though...

I'm really excited about this feature and if I can help with implementation, please let me know.

yihui commented 2 years ago

Okay. Then would you be willing to use options(blogdown.formats = list(slides = ..., distill = ...)) as opposed to config.yaml?

mivalek commented 2 years ago

Would options(blogdown.formats = list(slides = ..., default = ..., another_type = ..., ...)) be possible so that a defualt format gets applied to any documents that don't have a type: (e.g., slides, another_type) parameter set in YAML? If so, that's absolutely fine!

yihui commented 2 years ago

Sure.

mivalek commented 1 year ago

Hi @yihui, I'm wondering whether you've abandoned this issue or if it is still being worked on?

yihui commented 1 year ago

@mivalek I have not abandoned this issue (it's still an open issue). If you can open a pull request, we can discuss from there.

mivalek commented 1 year ago

Sorry for the delay. I've opened the PR. Not sure I managed to link it to this issue correctly, so here's the link https://github.com/rstudio/blogdown/pull/745