rstudio / rmarkdown

Dynamic Documents for R
https://rmarkdown.rstudio.com
GNU General Public License v3.0
2.87k stars 971 forks source link

Should render() favour reproducibility by default? #1673

Open MilesMcBain opened 4 years ago

MilesMcBain commented 4 years ago

In RStudio, pressing the 'knit' button appears to render an Rmd in separate R process - such that the code execution environment is not contaminated by the global environment or search path of the user's interactive session.

When calling render() from the console, the opposite is true. The user's environment is used by default (parent.frame()) and the library search path remains the same as the interactive session. So functions from previously attached packages can be called.

This means rendering an Rmd from the console is much more vulnerable to reproducibility issues than using the button in RStudio. I think it would be worth considering making render() use a separate process using callr by default, as is done in RStudio. The expectations set up by the GUI are setting people up for a surprise otherwise. This was certainly the case for me.


By filing an issue to this repo, I promise that

I understand that my issue may be closed if I don't fulfill my promises.

jennybc commented 4 years ago

Arguably a duplicate of #1204

MilesMcBain commented 4 years ago

Yeah. I wonder if that issue was closed prematurely. The vignette failure mode is just a small part of the problem space IMHO.

MilesMcBain commented 4 years ago

And having read what @gaborcsardi was saying in that issue a bit more closely, it sounds like callr can be used (with care) to implement something that will work in that S3 edge case.

cboettig commented 4 years ago

Just wanted to follow up on this, because I have also just been bitten by this in a case where I thought I was safe because I was already running it in an explicitly separate process. I have a test script which tests that .Rmd file render by running:

 R -e 'lapply("myfile.Rmd", rmarkdown::render)'

Due to the parent environment issue, this can give different behavior from the knit button, and can be resolved by either of:

 R -e 'lapply("myfile.Rmd", rmarkdown::render, envir = NULL)'
# or
 R -e 'lapply("myfile.Rmd", function(x) rmarkdown::render(x))'

I hadn't expected these to be different since I'm running in a fresh session either way, but the first version results in problems with S4 method dispatch, as seen if using the following as the "myfile.Rmd": https://github.com/espm-157/spatial-debug/blob/master/spatial/spatial-assignment.Rmd#L12-L24

jan-glx commented 4 years ago

Yes, please! Have been bitten by this multiple times too (only the third time time I figured out what the problem was). Problems were mostly related to the difference between executing the Rmd file in the global environment (as when using the knit button, using markdown::render(...) from the global environment, or using markdown::render(..., envir=globalenv() )) or in some local environment (as when using markdown::render(...) within a function).

@cboettig : your workarounds are not safe, I think: render calls from all files will share the same global environment (modifiable e.g. with <<-). Additionally, this execution in an local environment can lead to problems when the .Rmd code expects to be executed in the global environment (which in contrast to local environments is in the search path of package code).

To be safe, you have to render each file in a separate process. This can, but not as conveniently as this issue asks it to be, be achieved like this:

render_separately <- function(...) callr::r(
    function(...) rmarkdown::render(..., envir = globalenv()), args = list(...), show = TRUE)

lapply(c("myfile.Rmd", "myotherfile.Rmd"), render_separately)
jennybc commented 4 years ago

FWIW the next version of reprex will export reprex_render() that renders via callr:r() and in the global environment of the fresh R session. It's still in a non-master branch, but it will be merged into the dev version soon.

Update: this PR has now been merged, so reprex_render() exists and is exported in the dev version of reprex.

jdblischak commented 4 years ago

Relevant to this discussion, @jennybc added callr support to render_site() in rmarkdown 1.11. From the NEWS for that release:

new_session: true in _site.yml causes render_site() to render each file in a new R session, eliminating some cross-file difficulties, such as function masking (#1326, #1443 @jennybc).

So the callr package is already a suggested dependency.

hadley commented 4 years ago

I am strongly in favour of this proposal.