Open gorgitko opened 5 years ago
I believe this may be the same issue as https://github.com/rstudio/rmarkdown/issues/1268 and https://github.com/rstudio/rmarkdown/issues/701. As well as
and
In my opinion, the issue has still not been completely resolved. I'm running into this error myself in a much more simple situation--no network drives or other unusual configurations.
In my opinion, the issue has still not been completely resolved. I'm running into this error myself in a much more simple situation--no network drives or other unusual configurations.
I can confirm this is happening on local computer using local drives, not network ones.
So I have examined a little bit how is actually rmarkdown
using pandoc
and especially what is the temporary file pandoc
cannot find:
pandoc: /tmp/RtmpGXCEhW/rmarkdown-str7381eaa5d53.html: openBinaryFile: does not exist (No such file or directory)
I have found that this file is probably somehow used for navbars, but more important is this function:
# temp files created by as_tmpfile() cannot be immediately removed because they
# are needed later by the pandoc conversion; we have to clean up the temp files
# that have the pattern specified in `tmpfile_pattern` when render() exits
clean_tmpfiles <- function() {
unlink(list.files(
tempdir(), sprintf("^%s[0-9a-f]+[.]html$", tmpfile_pattern), full.names = TRUE
))
}
called in render()
:
# render() may call itself, e.g., in discover_rmd_resources(); in this case,
# we should not clean up temp files in the nested render() call, but wait
# until the top-level render() exits to clean up temp files
.globals$level <- .globals$level + 1L # increment level in a nested render()
on.exit({
.globals$level <- .globals$level - 1L
if (.globals$level == 0) clean_tmpfiles()
}, add = TRUE)
So what is actually happening? After render()
call is finished, this clean_tmpfiles
function removes all rmarkdown
temporary files. And because in parallel calling the temporary directory is remaining the same, it will also remove temporary files for other render()
calls.
I can confirm this dirty solution works (put this after library(rmarkdown)
):
clean_tmpfiles_mod <- function() {
message("Calling clean_tmpfiles_mod()")
}
assignInNamespace("clean_tmpfiles", clean_tmpfiles_mod, ns = "rmarkdown")
Would be great if developers add something like clean_tmpfiles = TRUE
to render()
parameters and users could then call clean_tmpfiles()
by themselves.
Full modified render.R
:
library(glue)
library(rmarkdown)
library(BiocParallel)
clean_tmpfiles_mod <- function() {
message("Calling clean_tmpfiles_mod()")
}
assignInNamespace("clean_tmpfiles", clean_tmpfiles_mod, ns = "rmarkdown")
N_CPUS <- 8
OUTPUT_DIR <- "rendered"
OUTPUT_FILES <- 1:10
BPPARAM <- MulticoreParam(workers = N_CPUS)
dir.create(OUTPUT_DIR, showWarnings = FALSE)
bplapply(OUTPUT_FILES, function(i) {
intermediates_dir <- glue("{i}_intermediates_dir")
render(
"to_render.Rmd",
output_file = glue("{i}.html"),
output_dir = OUTPUT_DIR,
params = list(title = glue("Document {i}")),
intermediates_dir = intermediates_dir
)
system(glue("rm -r {intermediates_dir}"))
}, BPPARAM = BPPARAM)
We're using a basic Pandoc script in bash to merge multiple files:
pandoc lf/lf_01.txt master.md lf/lf_02.txt master_fr.md lf/lf_03.txt master_es.md lf/lf_04.txt master_pt.md lf/lf_05.txt master_de.md lf/lf_06.txt master_it.md lf/lf_07.txt master_ja.md lf/lf_08.txt master.md lf/lf_09.txt > output.html
Is it possible to add a function to prevent clearing the temp files which causes the "openBinaryFile: does not exist (No such file or directory)" error in this medium?
@JayMMTL I am not sure how is this connected to rendering Rmds from within R. If this is the case, you can use the snippet I have provided to replace the clean_tmpfiles()
function, which causes this problem.
@gorgitko Your solution worked perfectly for me thanks. Working on a linux vm building rmarkdown files in parallel. Spent whole afternoon trying to figure this out and in the end your black magic dirty solution did the trick. Thanks!
Thanks @gorgitko for explaining the root cause. Unfortunately, your solution does not work on my Macbook Pro 16. It does call your mod function but the error 1 is thrown before:
<div id=: openBinaryFile: does not exist (No such file or directory)
Error: pandoc document conversion failed with error 1
Calling clean_tmpfiles_mod()
I also tried the set off a flag mentioned in several posts:
create_report(iris, config = configure_report(add_plot_str = FALSE))
SessionInfo:
R version 3.6.2 (2019-12-12)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Catalina 10.15.4
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] knitr_1.28.7 data.table_1.12.8 DataExplorer_0.8.1 rmarkdown_2.1.4
[5] likert_1.3.5 xtable_1.8-4 machinelearningtools_0.1 forcats_0.5.0
[9] stringr_1.4.0 dplyr_0.8.99.9003 purrr_0.3.4 readr_1.3.1
[13] tidyr_1.0.2 tibble_3.0.1 ggplot2_3.3.0 tidyverse_1.3.0.9000
[17] magrittr_1.5 googlesheets_0.3.0 psych_1.9.12.31
loaded via a namespace (and not attached):
[1] Rcpp_1.0.4.6 lubridate_1.7.8 lattice_0.20-38 assertthat_0.2.1 packrat_0.5.0
[6] digest_0.6.25 utf8_1.1.4 R6_2.4.1 cellranger_1.1.0 plyr_1.8.6
[11] backports_1.1.7 reprex_0.3.0 evaluate_0.14 highr_0.8 httr_1.4.1
[16] pillar_1.4.4 rlang_0.4.6.9000 curl_4.3 readxl_1.3.1 rstudioapi_0.11
[21] labeling_0.3 htmlwidgets_1.5.1 igraph_1.2.5 munsell_0.5.0 broom_0.5.6
[26] compiler_3.6.2 modelr_0.1.6 xfun_0.13 pkgconfig_2.0.3 mnormt_1.5-5
[31] htmltools_0.4.0 tidyselect_1.1.0 gridExtra_2.3 fansi_0.4.1 crayon_1.3.4
[36] dbplyr_1.4.3 withr_2.2.0 grid_3.6.2 nlme_3.1-142 jsonlite_1.6.1
[41] gtable_0.3.0 lifecycle_0.2.0 DBI_1.1.0 scales_1.1.1 cli_2.0.2
[46] stringi_1.4.6 farver_2.0.3 reshape2_1.4.4 fs_1.4.1 xml2_1.3.1
[51] ellipsis_0.3.1 generics_0.0.2 vctrs_0.3.0.9000 tools_3.6.2 glue_1.4.1
[56] networkD3_0.4 hms_0.5.3 yaml_2.2.1 parallel_3.6.2 colorspace_1.4-1
[61] rvest_0.3.5 haven_2.2.0
Any ideas??
@agilebean I am not sure if clean_tmpfiles_mod()
is correctly replacing the original clean_tmpfiles()
. It should echo "Calling clean_tmpfiles_mod()"
. Make sure you start with a clean environment and try my example first.
I continue to get random messages about files not existing.
Some version of an insistently retrying function may work (although clearly not ideal, it seems to be helping me avoid this somewhat random error):
library(purrr)
rate <- rate_backoff(pause_base = 0.1, pause_min = 0.005, max_times = 10)
insistent_render <- insistently(rmarkdown::render, rate, quiet = FALSE)
then call insistent_render
instead of rmarkdown::render
Any suggestions on how to fix this? I am running it in R version 4.04
pandoc.exe: \: openBinaryFile: does not exist (No such file or directory)
Warning: Error in : pandoc document conversion failed with error 1
128: stop
127: pandoc_convert
126: convert
125: render
124: discover_rmd_resources
123: find_external_resources
122: copy_render_intermediates
121: output_format$intermediates_generator
120:
@ktd2001
Which OS are you on ?
Which Pandoc version do you use ? rmarkdown::pandoc_version()
Where are you files located ? On a network drive ?
Can you share the file with the issue so we can try reproduce ? With any other element that could help us understand ?
Hi Chris, Thank you for responding so quickly. I am using a Window 10 Here is the file and dataset that is located in the same folder as the NB.
I could not find which Pandoc version I use ? rmarkdown::pandoc_version(). How would I find this? I did a search in the files but it continues to search.
My best, Keiana
On Wed, Mar 17, 2021 at 1:15 PM Christophe Dervieux < @.***> wrote:
@ktd2001 https://github.com/ktd2001 Which OS are you on ? Which Pandoc version do you use ? rmarkdown::pandoc_version() Where are you files located ? On a network drive ?
Can you share the file with the issue so we can try reproduce ? With any other element that could help us understand ?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/rstudio/rmarkdown/issues/1632#issuecomment-801260371, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKK2EYJNTCQMPPLQDNY3NITTEDPQTANCNFSM4IRBJKFA .
@cderv I am just curious if anyone at RStudio has looked into the problem I proposed in the https://github.com/rstudio/rmarkdown/issues/1632#issuecomment-545824711
TLDR: Allow users to specify whether clean_tmpfiles()
will be run after render()
has finished. Possibly allow users to run it manually.
Anyway, I think each rendered Rmd should get its own directory in tempdir()
; that would basically avoid this problem.
Thank you for looking at this :slightly_smiling_face:
I cleared the temporary files and still getting same error message:
pandoc.exe: \: openBinaryFile: does not exist (No such file or directory)
Warning: Error in : pandoc document conversion failed with error 1
128: stop
127: pandoc_convert
126: convert
125: render
124: discover_rmd_resources
123: find_external_resources
122: copy_render_intermediates
121: output_format$intermediates_generator
120:
On Thu, Mar 18, 2021 at 3:57 AM Jiri Novotny @.***> wrote:
@cderv https://github.com/cderv I am just curious if anyone at RStudio has looked into the problem I proposed in the #1632 (comment) https://github.com/rstudio/rmarkdown/issues/1632#issuecomment-545824711
TLDR: Allow users to specify whether clean_tmpfiles() will be run after render() has finished. Possibly allow users to run it manually.
Anyway, I think each rendered Rmd should get its own directory in tempdir(); that would basically avoid this problem.
Thank you for looking at this 🙂
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/rstudio/rmarkdown/issues/1632#issuecomment-801712631, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKK2EYIOCFTKTR4SFFH53K3TEGW5PANCNFSM4IRBJKFA .
@gorgitko I tried your solution, but I still get the error intermittently.
Note that this works correctly in my local RStudio session, but not always in a Docker container for Continuous Integration on GitLab with rocker/geospatial
If I re-run the same CI, it works again.
In addition: Warning message:
In readLines(con, warn = FALSE) :
cannot open file 'C03-planning.utf8.md': No such file or directory
Calling clean_tmpfiles_mod()
Execution halted
Calling clean_tmpfiles_mod()
pandoc: C01-bonjour.utf8.md: openBinaryFile: does not exist (No such file or directory)
Error: pandoc document conversion failed with error 1
Note that I created a R file called to run in parallel. The R file looks like this:
knit_file_system <- function(x) {
runR <- tempfile(fileext = "run.R")
cat(
paste0(
paste0(".libPaths(c(\"", paste(.libPaths(), collapse = "\", \""), "\"));"),
# Avoid https://github.com/rstudio/rmarkdown/issues/1632#issuecomment-545824711
'
library(rmarkdown)
clean_tmpfiles_mod <- function() {
message("Calling clean_tmpfiles_mod()")
}
assignInNamespace("clean_tmpfiles", clean_tmpfiles_mod, ns = "rmarkdown")
',
# 'rmarkdown::render(',
'render(',
gsub("\\", "\\\\", .x, fixed = TRUE),
'envir = new.env(), encoding = "UTF-8"', #parent = baseenv()
')'
)
, file = runR)
system(
paste(normalizePath(file.path(Sys.getenv("R_HOME"), "bin", "Rscript"), mustWork = FALSE), runR)
)
}
Then call it with {future}
library(future)
future::plan(future::multicore)
all_my_rmds <- list.files(pattern = "[.]Rmd")
future_imap(all_my_rmds,
~try(knit_file_system(.x)),
.progress = TRUE)
@statnmap does it happens also if you explicitly run each render()
in a new session ?
e.g using xfun::Rscript_call
or callr::r()
?
If the issue is really with each render having its own tempdir()
this could solve it.
I am thinking more and more of providing a way to run a render()
in a new session (for example, rmarkdown::render(..., new_session = TRUE)
) to mimic the knit button - Parallel use of render could be another case in favor it is works better this way. 🤔
I use the system()
command to run a new session. I updated my code above for this missing part.
@cderv
If the issue is really with each render having its own
tempdir()
this could solve it.
This is definitely the issue (i.e. common tempdir()
for all render()
calls evaluated in the same R session), but controlling the execution of clean_tmpfiles()
from the user side will definitely bring less overhead than starting a new session. Alternatively, creating a random-named tempdir inside tempdir()
will also solve this (or at least the user could have the opportunity to do that).
Now I know why I still have this problem, this is because I knit twice the same file in parallel, such that one process deletes the utf8.md file while the other process is trying to access it.
@statnmap That's why I am using an unique intermediate_dir
for each render()
call in my code snippet:
intermediates_dir <- glue("{i}_intermediates_dir")
render(..., intermediates_dir = intermediates_dir)
# Cleaning.
system(glue("rm -r {intermediates_dir}"))
It's also possible to use completely random intermediate dirs, e.g.:
paste0("intermediates_", stringi::stri_rand_strings(1, 10))
[1] "intermediates_HaPxZbAKXY"
FYI this issue caused a major issue with some work products - rendering a bunch of rmarkdown to latex to pandoc in parallel caused some of the files to have a file name that didn't match the file content. Until this is fixed, there should be a warning or error if it is run in parallel.
Using @gorgitko's method to create unique intermediate directories solved the issue, so the fix should implement this.
I was getting similar errors, but had nothing to do with running in parallel. Instead, the withBinaryFile: does not exist (No such file or directory)
error truly wasn't able to find the file it needed. In case someone else runs into this, here was my thought process.
While containerizing a report, I hit the following error:
pandoc: /usr/local/lib/R/site-library/fontawesome/fontawesome/css/../webfonts/fa-v4compatibility.woff2: withBinaryFile: does not exist (No such file or directory)
Error: pandoc document conversion failed with error 1
I then checked the directory it was complaining about:
> list.files("/usr/local/lib/R/site-library/fontawesome/fontawesome/css/../webfonts/")
[1] "fa-brands-400.ttf" "fa-brands-400.woff" "fa-regular-400.ttf"
[4] "fa-regular-400.woff" "fa-solid-900.ttf" "fa-solid-900.woff"
As you can see, the file fa-v4compatibility.woff2
really didn't exist. I then checked the rstudio/fontawesome
to see if they were tracking the file I needed and indeed they were. So I simply installed the latest version and was good to go, problem solved.
@gorgitko I'm trying to use your solution, but the render.r
I'm seeing looks very different from what's referenced here; the file I'm looking at is in rmarkdown-master/R/render.r
, but it's over 1200 lines. I tried adding your code anyway, right inside of the "render" function, but it still failed in the same way. Am I looking in the right file, or does the new structure break your solution? (Or something else?)
@gorgitko I'm trying to use your solution, but the
render.r
I'm seeing looks very different from what's referenced here; the file I'm looking at is inrmarkdown-master/R/render.r
, but it's over 1200 lines. I tried adding your code anyway, right inside of the "render" function, but it still failed in the same way. Am I looking in the right file, or does the new structure break your solution? (Or something else?)
Apologies, actually a much dumber problem -- on mac, the file name was displayed as "x.bib" but the real name was *x.bib.txt", but that could only be seen when accessing meta-information on the file (Command I). Leaving here in case somebody else is confused into the same dumb problem.
In the vain of dumb problems, omitting the -o before specifying the output file will cause the same error. This one caught me out (read the docs people!)
I confirm that this bug is still present with makeForkCluster (makePSOCKcluster works fine) with the following configuration:
Apple M3 Pro macOS 14.5 R 4.3.2 pandoc 3.1.1 rmarkdown 2.22
The bug was hard to pinpoint, but the "dirty" solution proposed by gorgitko, copied below, still works:
clean_tmpfiles_mod <- function() {
message("Calling clean_tmpfiles_mod()")
}
assignInNamespace("clean_tmpfiles", clean_tmpfiles_mod, ns = "rmarkdown")
By filing an issue to this repo, I promise that
xfun::session_info('rmarkdown')
. I have upgraded all my packages to their latest versions (e.g., R, RStudio, and R packages), and also tried the development version:remotes::install_github('rstudio/rmarkdown')
.I understand that my issue may be closed if I don't fulfill my promises.
When I am using
rmarkdown::render()
inBiocParallel::bplapply()
, Pandoc throws this error:pandoc: /tmp/RtmpW06rTD/rmarkdown-str3bc26dd971b5.html: openBinaryFile: does not exist (No such file or directory)
. I am using Pandoc version 2.7.3 and development version of rmarkdown. Everything works fine when I useBPPARAM = SerialParam()
inbplapply
(i.e. it will disable parallel processing).to_render.Rmd
:Output from
render.R
:Pandoc and session info: