notestar is a notebook system built on the targets package: notes with targets.
You can install notestar from GitHub with:
# install.packages("devtools")
devtools::install_github("tjmahr/notestar")
Here is an example project/notebook showing how notestar combines various .Rmd files into a single HTML file: https://github.com/tjmahr/notestar-demo.
First, we follow a targets-based
workflow. We develop datasets and models and so on, writing functions in
R/functions.R and describing a build pipeline in _targets.R
. Some
familiarity with the big ideas of the targets packages is required.
We then work with our data-analysis products in RMarkdown .Rmd files in
a notebook directory. We read in these targets using
targets::tar_read()
, and we might develop several entries notebook as
we tackle different parts of our analysis problem.
In the _targets.R
file, there are special notebook-related targets.
When we run targets::tar_make()
, notestar does the following:
For each .Rmd file, it knits (knitr::knit()
) the corresponding .md
output file: computing running the code, printing the results,
saving and inserting figures.
These .md files are collated and assembled into a single-page bookdown document. It looks kind of a data-analysis blog (a page with a sequence of entries in reverse-chronological order).
Importantly, notestar only does these jobs when needed. For example, a notebook entryβs .md file will only be created if it is outdated. That is,
data <- targets::tar_read(data)
) in the .Rmd source file has
changed.notestarβs role in all of this is to link the data analysis targets to the .Rmd files and then orchestrate the assembly of a notebook from these entries.
Letβs highlight some packages that are indispensable for this scheme to work.
Below I show a worked example and describe things in great detail. But before that I want to note that as a user, I only really use 3β4 functions from this package.
use_notestar()
to set up a notestar projectuse_notestar_makefile()
to set up a Makefile that runs
targets::tar_make()
. I then use the RStudioβs Build commands to
build projects.use_notestar_references()
to set up a .bib and .csl
file for the notebook.notebook_create_page()
to create a new notebook entrynotebook_browse()
to open the final notebook file in a browser.notestar works best inside of a data analysis project and specifically, as a part of an RStudio project. That is, we have some directory for our project. Everything we do or create will live in that directory, and that directory is the default working directory for all of our R code.
For demonstration, letβs create a new directory inside of a temporary directory and make that the home base for our project.
project_dir <- file.path(tempdir(), pattern = "my-project")
dir.create(project_dir)
setwd(project_dir)
Nothing here!
fs::dir_tree(all = TRUE)
#> .
use_notestar()
will populate the project directory with the basic
skeleton for the project. We set the theme to "water-dark"
so that the
screenshots below stick out better from the white background on GitHub.
library(notestar)
use_notestar(cleanrmd_theme = "water-dark")
fs::dir_tree(all = TRUE)
#> .
#> βββ .here
#> βββ config.yml
#> βββ notebook
#> β βββ 0000-00-00-references.Rmd
#> β βββ book
#> β β βββ assets
#> β βββ index.Rmd
#> β βββ knitr-helpers.R
#> βββ R
#> β βββ functions.R
#> βββ _targets.R
The file config.yml
is a
config-package configuration file.
These configuration options were set when we called use_notestar()
, so
these are all the default configuration options (except for
cleanrmd_theme
). Each of these is described by a comment
field.
writeLines(readLines("config.yml"))
---
default:
notestar:
dir_notebook:
comment: >
directory location for user-edited notebook entries (as
RMarkdown files)
value: "notebook"
dir_md:
comment: >
directory location for knitted/rendered notebook entries
(as markdown files)
value: "notebook/book"
notebook_helper:
comment: >
path to an R script that is run before knitting each notebook
entry
value: "notebook/knitr-helpers.R"
cleanrmd_theme:
comment: >
CSS theme to use for the notebook. Anything printed by
cleanrmd::cleanrmd_themes() should work.
value: "water-dark"
notebook_filename:
comment: >
Name to use for the final html file. Defaults to "notebook"
which produces "notebook.html"
value: "notebook"
---
Two .Rmd files are automatically included: index.Rmd
and
0000-00-00.Rmd
. These are the first and last entries (top and bottom
parts) of the notebook. index.Rmd
houses metadata for the
notebook:
writeLines(readLines("notebook/index.Rmd"))
---
title: "Notebook title"
author: "Author Name"
date: >
`r knitr::inline_expr('format(Sys.time(), "Updated on %A, %B %d, %Y %I:%M %p")')`
site: bookdown::bookdown_site
link-citations: true
---
The yaml metadata in index.Rmd
is created automatically inside the
_targets.R
file. More on that later.
0000-00-00.Rmd
is not meant to be edited. As it tells us, it
provides a βReferencesβ heading. When the bibliography is appended to
the end of the notebook, it will be printed under this heading.
writeLines(readLines("notebook/0000-00-00-references.Rmd"))
<!-- Chapters/posts are collated in reverse chronological order so -->
<!-- this last one is a dummy entry to tell bookdown where to drop -->
<!-- the references. -->
## References
The file _targets.R
orchestrates the compilation of the notebook
using the targets package. targets::tar_make()
compiles the notebook
by:
notebook
if necessary to produce a
corresponding .md file notebook/book/
.notebook/book/
into a single-document
bookdown book with bookdown/RMarkdown/pandoc (if necessary).I say βif necessaryβ because targets only builds the targets in workflow if the target has not been built yet or if the target is out of date. Thus, notestar doesnβt waste time regenerating earlier entries if they or their dependencies have not changed.
Finally, .here
is a sentinel file for the
here package. It indicates where the project
root is located. R/functions.R
is an (as-yet empty) R script that
is source()
-ed at the start of _targets.R
.
Here we build the notebook and see targets build each target.
targets::tar_make()
#> β’ start target notebook_output_yaml
#> β’ built target notebook_output_yaml
#> β’ start target notebook_deps_in_index_yml
#> β’ built target notebook_deps_in_index_yml
#> β’ start target entry_0000_00_00_references_rmd
#> β’ built target entry_0000_00_00_references_rmd
#> β’ start target notebook_index_yml
#> β’ built target notebook_index_yml
#> β’ start target notebook_helper_user
#> β’ built target notebook_helper_user
#> β’ start target spellcheck_exceptions
#> β’ built target spellcheck_exceptions
#> β’ start target notebook_index_rmd
#> β’ built target notebook_index_rmd
#> β’ start target notebook_helper
#> β’ built target notebook_helper
#> β’ start target notebook_rmds
#> β’ built target notebook_rmds
#> β’ start target entry_index_md
#> β’ built target entry_index_md
#> β’ start target entry_0000_00_00_references_md
#> β’ built target entry_0000_00_00_references_md
#> β’ start target spellcheck_notebook
#> β’ built target spellcheck_notebook
#> β’ start target notebook_mds
#> β’ built target notebook_mds
#> β’ start target spellcheck_report_results_change
#> β’ built target spellcheck_report_results_change
#> β’ start target notebook_bookdown_yaml
#> β’ built target notebook_bookdown_yaml
#> β’ start target spellcheck_report_results
#> No spelling errors found.
#> β’ built target spellcheck_report_results
#> β’ start target notebook
#>
#>
#> processing file: index.Rmd
#> | | | 0% | |......................................................................| 100%
#> inline R code fragments
#>
#>
#> output file: index.knit.md
#>
#> "C:/Program Files/RStudio/bin/pandoc/pandoc" +RTS -K512m -RTS notebook.md --to html5 --from markdown+autolink_bare_uris+tex_math_single_backslash --output notebook.html --lua-filter "C:\Users\trist\AppData\Local\R\win-library\4.2\bookdown\rmarkdown\lua\custom-environment.lua" --lua-filter "C:\Users\trist\AppData\Local\R\win-library\4.2\rmarkdown\rmarkdown\lua\pagebreak.lua" --lua-filter "C:\Users\trist\AppData\Local\R\win-library\4.2\rmarkdown\rmarkdown\lua\latex-div.lua" --metadata-file "C:\Users\trist\AppData\Local\Temp\RtmpOE7W2v\file58d87b347c18" --self-contained --variable disable-fontawesome --variable title-in-header --highlight-style pygments --table-of-contents --toc-depth 3 --mathjax --variable "mathjax-url:https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" --template "C:/Users/trist/AppData/Local/R/win-library/4.2/cleanrmd/template/cleanrmd.html" --include-in-header "C:\Users\trist\AppData\Local\Temp\RtmpOE7W2v\rmarkdown-str58d82cc775d5.html"
#>
#> Output created: docs/notebook.html
#> β’ built target notebook
#> β’ end pipeline
If we ask it to build the book again, it skips everythingβnone of the dependencies have changedβbut a special spell-checking target set to always run.
targets::tar_make()
#> β skip target notebook_output_yaml
#> β skip target notebook_deps_in_index_yml
#> β skip target entry_0000_00_00_references_rmd
#> β skip target notebook_index_yml
#> β skip target notebook_helper_user
#> β skip target spellcheck_exceptions
#> β skip target notebook_index_rmd
#> β skip target notebook_helper
#> β skip target notebook_rmds
#> β skip target entry_index_md
#> β skip target entry_0000_00_00_references_md
#> β skip target spellcheck_notebook
#> β skip target notebook_mds
#> β’ start target spellcheck_report_results_change
#> β’ built target spellcheck_report_results_change
#> β skip target notebook_bookdown_yaml
#> β skip target spellcheck_report_results
#> β skip target notebook
#> β’ end pipeline
Right now, our compiled notebook ("notebook/book/docs/notebook.html"
)
is just the title page:
If we look at the project tree, we see some additions.
fs::dir_tree(all = TRUE)
#> .
#> βββ .here
#> βββ config.yml
#> βββ notebook
#> β βββ 0000-00-00-references.Rmd
#> β βββ book
#> β β βββ 0000-00-00-references.md
#> β β βββ assets
#> β β βββ docs
#> β β β βββ 0000-00-00-references.md
#> β β β βββ index.md
#> β β β βββ notebook.html
#> β β β βββ reference-keys.txt
#> β β βββ index.Rmd
#> β β βββ knitr-helpers.R
#> β β βββ notebook.rds
#> β β βββ _bookdown.yml
#> β β βββ _output.yml
#> β βββ index.Rmd
#> β βββ knitr-helpers.R
#> βββ R
#> β βββ functions.R
#> βββ shot1.png
#> βββ _targets
#> β βββ .gitignore
#> β βββ meta
#> β β βββ meta
#> β β βββ process
#> β β βββ progress
#> β βββ objects
#> β β βββ notebook_deps_in_index_yml
#> β β βββ notebook_index_yml
#> β β βββ notebook_rmds
#> β β βββ spellcheck_exceptions
#> β β βββ spellcheck_notebook
#> β β βββ spellcheck_report_results
#> β β βββ spellcheck_report_results_change
#> β βββ user
#> βββ _targets.R
_targets/
is a new directory. It is the object and metadata storage
for targets. We donβt worry about it.
There are some md files in notebook/book/
as well as some
bookdown-related files (_bookdown.yml
, _output.yml
and
notebook.rds
file). There is also the output of bookdown in
notebook/book/docs
. (notebook/book/docs/notebook.html
is the file we
screenshotted earlier.)
knitr-helpers.R
was also copied to the notebook/book/
directory.
This copying reflects design decision by the package. Namely, the
contents of the notebook/book
directory should not be edited by
hand. Its contents should be reproducible whether by regenerating
files (like the .md files) or by copying files (like knitr-helpers.R
.
The user should only have to worry about editing files in the
notebook/
directory or in _targets.R
(or perhaps config.yml
).
We can create a new entry from a template using notebook_create_page()
and regenerate the notebook. (A slug is some words we include in the
filename to help remember what the entry is about.)
notebook_create_page(date = "2022-02-22", slug = "hello-world")
#> β Setting active project to 'C:/Users/trist/AppData/Local/Temp/Rtmp0CF7CR/my-project'
#> β Writing 'notebook/2022-02-22-hello-world.Rmd'
#> β’ Edit 'notebook/2022-02-22-hello-world.Rmd'
#> β 'notebook/2022-02-22-hello-world.Rmd' created
Now targets has to rebuild the notebook because there is a new entry
that needs to be folded in. The network diagram shows that
entry_2022_02_hello_world_rmd
is outdated (blue) so everything
downstream from it is also outdated.
targets::tar_visnetwork(targets_only = TRUE)
When we rebuild the notebook, that entry now appears in the HTML file.
targets::tar_make()
#> [output omitted]
From here, we go with the flow. We use targets as we normally would,
modifying R/functions.R
and targets.R
to set up our data-processing
pipeline. We can now use our notebook to do reporting and exploration as
part of our data-processing pipeline. Things we make with targets can be
tar_read()
into our notebook entries and tracked as dependencies.
In this section, we will describe some behind-the-scenes details about notestar using the worked example.
Here is what a minimal Rmd file entry looks like:
writeLines(readLines("notebook/2022-02-22-hello-world.Rmd"))
<!--- Timestamp to trigger book rebuilds: `r Sys.time()` --->
```{r setup, include = FALSE}
# library(tidyverse)
# fit <- targets::tar_read(fit)
# fit
```
## Feb. 22, 2022 (Demo entry)
<small>Source: <code>`r knitr::current_input()`</code></small>
```{r content}
```
That first <!--- comment --->
line on top is an HTML comment. It will
not be displayed when we view the final html file, but when the .Rmd
file is knitted to produce the corresponding .md, the timestamp will be
updated. Here is the first line of that .md file:
writeLines(readLines("notebook/book/2022-02-22-hello-world.md")[1])
<!--- Timestamp to trigger book rebuilds: 2022-03-01 13:35:54 --->
This timestamp allows us to mark a notebook entry as outdated even if none of the text in the .md file has changed. Here is a motivating example. Letβs append a code chunk to the bottom of the notebook entry. It will plot a histogram.
entry_v0 <- readLines("notebook/2022-02-22-hello-world.Rmd")[1:13]
writeLines(
c(
entry_v0,
"Would you look at all these 4's?",
"```{r old-faithful, fig.width = 4, fig.height = 4}",
"hist(faithful$eruptions)",
"```"
),
"notebook/2022-02-22-hello-world.Rmd"
)
And then we regenerate the notebook.
targets::tar_make()
#> [output omitted]
Letβs store the current .md file lines so we can compare it to a later version.
entry_v1 <- readLines("notebook/book/2022-02-22-hello-world.md")
Now, suppose we wanted to change size or resolution of the plot. In this
case, we will change the fig.width
and fig.height
values to 6 here
and regenerate the notebook
writeLines(
c(
entry_v0,
"Would you look at all these 4's?",
"```{r old-faithful, fig.width = 6, fig.height = 6}",
"hist(faithful$eruptions)",
"```"
),
"notebook/2022-02-22-hello-world.Rmd"
)
targets::tar_make()
#> [output omitted]
The figures image files have definitely changed: they are different sizes! The text in the plots in the two screenshots are different sizes. But the text of the .md files is the sameβexcept for the timestamp.
entry_v2 <- readLines("notebook/book/2022-02-22-hello-world.md")
entry_v1 == entry_v2
#> [1] FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
#> [13] TRUE TRUE
entry_v1[1]
#> [1] "<!--- Timestamp to trigger book rebuilds: 2022-03-01 13:36:00 --->"
entry_v2[1]
#> [1] "<!--- Timestamp to trigger book rebuilds: 2022-03-01 13:36:07 --->"
This phenomenon, where a change to an .Rmd file would not cause a change in the text of a .md file, is the reason for the timestamp at the top of the .Rmd file.
Our targets graph has a node called entry_2022_02_22_hello_world_md
.
targets::tar_visnetwork(
targets_only = TRUE,
names = "entry_2022_02_22_hello_world_md"
)
That node does not represent just the file
notebook/book/2022-02-22-hello-world.md
. Its plot is also tracked as
byproduct of the entry:
targets::tar_read("entry_2022_02_22_hello_world_md")
#> [1] "notebook/book/2022-02-22-hello-world.md"
#> [2] "notebook/book/assets/figure/2022-02-22-hello-world/old-faithful-1.png"
Thus, if I removed the image, that notebook entry becomes outdated and needs to be reprocessed.
file.remove(targets::tar_read("entry_2022_02_22_hello_world_md")[2])
#> [1] TRUE
targets::tar_visnetwork(
targets_only = TRUE,
names = "entry_2022_02_22_hello_world_md"
)
(I forget the problem that motivated me to add this layer of tracking on top of the timestamping, but itβs there.)
# undo the deletion before moving on
targets::tar_make()
#> [output omitted]
Think about any other time youβve used knitr or RMarkdown. When you remove the code to produce a figure in an .Rmd file, what happens to the plotβs image file? Normally, it sticks around, and you eventually find yourself with all kinds of old, no-longer used figures. When knitting a .Rmd file, notestar removes all existing figures associated with an entry beforehand. As a result, only figures that were created during the most recent knitting are retained. This move is what allows image dependencies (see last point) to be inferred: If an image file is created as a result of knitting an .Rmd file, we can associate it with .md file.
Letβs demonstrate this feature. Here are the current notebook assets:
fs::dir_tree("./notebook/book/assets")
#> ./notebook/book/assets
#> βββ figure
#> βββ 2022-02-22-hello-world
#> βββ old-faithful-1.png
We will restore the original version of the entry so that the plot is no longer created.
writeLines(entry_v0, "notebook/2022-02-22-hello-world.Rmd")
targets::tar_make()
#> [output omitted]
What we have now is an empty directory.
fs::dir_tree("./notebook/book/assets")
#> ./notebook/book/assets
#> βββ figure
#> βββ 2022-02-22-hello-world
This behavior is controlled in the knitr-helpers.R
file, specifically
the last line:
writeLines(readLines("notebook/knitr-helpers.R"))
# This script is run before knitting each chapter. It sets the knitting root
# directory so that it can see the `_targets` folder, and it sets the chunk
# default settings.
notestar::notebook_set_opts_knit()
notestar::notebook_set_opts_chunk()
notestar::notebook_set_markdown_hooks()
knitr::opts_knit$set(notestar_purge_figures = TRUE)