ropensci / targets

Function-oriented Make-like declarative workflows for R
https://docs.ropensci.org/targets/
Other
933 stars 75 forks source link

tar_make slow when callr_function not NULL #346

Closed matthiasgomolka closed 3 years ago

matthiasgomolka commented 3 years ago

Prework

Description

tar_make() takes a lot of time to start up (several minutes). In comparison, tar_make(callr_function = NULL) starts almost without delay. See the benchmarks.

Reproducible example

Since I'm working with confidential data, creating a reprex is quite some work. Therefore, I hope the benchmarks already help. If they don't I will take the time to create a reprex.

Benchmarks

I tried to use proffer, but I got some errors. Also, as far as I understand, proffer should only work with callr_function = NULL.

```r > proffer::pprof(tar_make()) * run target mifid_files v skip target ids v skip branch mifid_e5ccf3fc v skip branch mifid_276b0dbc v skip branch mifid_85ba955c v skip branch mifid_32c58c89 v skip branch ids_list_d984c7ca v skip branch ids_list_afadd32b v skip branch ids_list_1622817d v skip branch ids_list_5ff49e59 v skip branch id_counts_07b60a7a v skip branch id_counts_4ffd5edf v skip branch id_counts_6d3fac2a v skip branch id_counts_f10827a2 v skip branch id_counts_259d208a v skip branch id_counts_5bced0f6 v skip branch sdc_07b60a7a v skip branch sdc_4ffd5edf v skip branch sdc_6d3fac2a v skip branch sdc_f10827a2 v skip branch sdc_259d208a v skip branch sdc_5bced0f6 * end pipeline Error in rethrow_call(c_processx_exec, command, c(command, args), stdin, : Command '' not found @win/processx.c:994 (processx_exec) Type .Last.error.trace to see where the error occured > proffer::pprof(tar_make(callr_function = NULL)) * run target mifid_files v skip target ids v skip branch mifid_e5ccf3fc v skip branch mifid_276b0dbc v skip branch mifid_85ba955c v skip branch mifid_32c58c89 v skip branch ids_list_d984c7ca v skip branch ids_list_afadd32b v skip branch ids_list_1622817d v skip branch ids_list_5ff49e59 v skip branch id_counts_07b60a7a v skip branch id_counts_4ffd5edf v skip branch id_counts_6d3fac2a v skip branch id_counts_f10827a2 v skip branch id_counts_259d208a v skip branch id_counts_5bced0f6 v skip branch sdc_07b60a7a v skip branch sdc_4ffd5edf v skip branch sdc_6d3fac2a v skip branch sdc_f10827a2 v skip branch sdc_259d208a v skip branch sdc_5bced0f6 * end pipeline Error in rethrow_call(c_processx_exec, command, c(command, args), stdin, : Command '' not found @win/processx.c:994 (processx_exec) Type .Last.error.trace to see where the error occured > .Last.error.trace Stack trace: 1. proffer::pprof(tar_make(callr_function = NULL)) 2. proffer:::serve_pprof(pprof = pprof, host = host, port = port, ... 3. proffer:::serve_pprof_impl(args) 4. proffer:::with_safe_path(Sys.getenv("PROFFER_GRAPHVIZ_BIN"), ... 5. processx::process$new(command = pprof_path(), args = args, stdout = "|", ... 6. .subset2(public_bind_env, "initialize")(...) 7. processx:::process_initialize(self, private, command, args, stdin, ... 8. rethrow_call(c_processx_exec, command, c(command, args), stdin, ... x Command '' not found @win/processx.c:994 (processx_exec) ```

Thus, I measured the time for tar_make() with system.time():

> system.time(tar_make(callr_function = NULL))
* run target mifid_files
v skip target ids
v skip branch mifid_e5ccf3fc
v skip branch mifid_276b0dbc
v skip branch mifid_85ba955c
v skip branch mifid_32c58c89
v skip branch ids_list_d984c7ca
v skip branch ids_list_afadd32b
v skip branch ids_list_1622817d
v skip branch ids_list_5ff49e59
v skip branch id_counts_07b60a7a
v skip branch id_counts_4ffd5edf
v skip branch id_counts_6d3fac2a
v skip branch id_counts_f10827a2
v skip branch id_counts_259d208a
v skip branch id_counts_5bced0f6
v skip branch sdc_07b60a7a
v skip branch sdc_4ffd5edf
v skip branch sdc_6d3fac2a
v skip branch sdc_f10827a2
v skip branch sdc_259d208a
v skip branch sdc_5bced0f6
* end pipeline
   user  system elapsed 
   0.20    0.39   10.23 

> system.time(tar_make())
* run target mifid_files
v skip target ids
v skip branch mifid_e5ccf3fc
v skip branch mifid_276b0dbc
v skip branch mifid_85ba955c
v skip branch mifid_32c58c89
v skip branch ids_list_d984c7ca
v skip branch ids_list_afadd32b
v skip branch ids_list_1622817d
v skip branch ids_list_5ff49e59
v skip branch id_counts_07b60a7a
v skip branch id_counts_4ffd5edf
v skip branch id_counts_6d3fac2a
v skip branch id_counts_f10827a2
v skip branch id_counts_259d208a
v skip branch id_counts_5bced0f6
v skip branch sdc_07b60a7a
v skip branch sdc_4ffd5edf
v skip branch sdc_6d3fac2a
v skip branch sdc_f10827a2
v skip branch sdc_259d208a
v skip branch sdc_5bced0f6
* end pipeline
   user  system elapsed 
   2.39    0.59  372.45 

I also profiled using profvis. Not sure if this helps, but see for yourself.

For profvis::profvis(tar_make(callr_function = NULL)): image

And for profvis::profvis(tar_make()): image

So it seems as if starting a new R session for tar_make() takes a lot of time, which I don't understand.

Am I missing something obvious here?

Also, as written before, please say if you need a real reprex and I'll try to create one.

matthiasgomolka commented 3 years ago

I noticed a similar error as the one from proffer when calling tar_watch(). This made me think if there are some global settings in my Windows environment (I don't have admin rights) which which cause problems and lead to some sort of retries which might be the cause for tar_make() to be so slow.

> tar_watch()
Error in rethrow_call(c_processx_exec, command, c(command, args), stdin,  : 
  create process 'U:/svn/FDSZ/01_data_production/01_data_preparation/mifid/oc_analysis/renv/library/R-4.0/x86_64-w64-mingw32/processx/bin/x64/supervisor.exe' (system error 1260, Dieses Programm wurde durch eine Gruppenrichtlinie geblockt. Wenden Sie sich an den Systemadministrator, um weitere Informationen zu erhalten.
) @win/processx.c:1042 (processx_exec)

As you might have noticed, I don't really know anything about this.

wlandau commented 3 years ago

Would you post a reprex of just the initialization part? I think the slowness should be reproducible if you remove the target list and keep everything else from _targets.R.

# _targets.R
library(targets)
library(...)
tar_option_set(...)
future::plan(...)
# other setup steps
list() # blank target list

From some of your profiling output, it looks like you may be using future. What happens if you choose the sequential plan or future.callr::callr instead of the PSOCK-based multisession plan?

wlandau commented 3 years ago

Alternatively, you could try reproducing it without targets: take everything from _targets.R except the target list and put it in a callr::r() call. Curious to see how slow that is.

callr::r(function() {
  library(targets)
  library(...)
  tar_option_set(...)
  future::plan(...)
  # other setup steps
})
wlandau commented 3 years ago

In my experience, the initialization bottlenecks are almost always loading packages and initializing PSOCK clusters for future. There may be a delay due to the callr process, but that should be no more than a second or two.

matthiasgomolka commented 3 years ago

From some of your profiling output, it looks like you may be using future. What happens if you choose the sequential plan or future.callr::callr instead of the PSOCK-based multisession plan?

Setting plan("sequential") does not change anything.

Also, stripping _targets.R down to

library(targets)
list()

does not change anything:

library(targets)
#> Warning: package 'targets' was built under R version 4.0.4

tar_script({
  list()
})

# new session
system.time(tar_make())
#> * end pipeline
#> Warning message:
#> package 'targets' was built under R version 4.0.4 
#>    user  system elapsed 
#>    0.35    0.08  165.46

# same session
system.time(tar_make(callr_function = NULL))
#> * end pipeline
#>    user  system elapsed 
#>    0.07    0.02    0.10

Created on 2021-03-18 by the reprex package (v0.3.0)

Session info ``` r sessionInfo() #> R version 4.0.2 (2020-06-22) #> Platform: x86_64-w64-mingw32/x64 (64-bit) #> Running under: Windows 10 x64 (build 17763) #> #> Matrix products: default #> #> locale: #> [1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252 #> [3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C #> [5] LC_TIME=German_Germany.1252 #> #> attached base packages: #> [1] stats graphics grDevices datasets utils methods base #> #> other attached packages: #> [1] targets_0.2.0 #> #> loaded via a namespace (and not attached): #> [1] igraph_1.2.6 knitr_1.30 magrittr_2.0.1 tidyselect_1.1.0 #> [5] R6_2.5.0 rlang_0.4.9 fansi_0.4.1 stringr_1.4.0 #> [9] highr_0.8 tools_4.0.2 data.table_1.14.0 xfun_0.19 #> [13] cli_2.2.0 withr_2.3.0 ellipsis_0.3.1 htmltools_0.5.0 #> [17] yaml_2.2.1 digest_0.6.27 assertthat_0.2.1 lifecycle_0.2.0 #> [21] tibble_3.0.4 crayon_1.3.4 processx_3.4.4 purrr_0.3.4 #> [25] callr_3.5.1 vctrs_0.3.5 ps_1.4.0 codetools_0.2-16 #> [29] glue_1.4.2 evaluate_0.14 rmarkdown_2.5 stringi_1.5.3 #> [33] pillar_1.4.7 compiler_4.0.2 renv_0.12.5 pkgconfig_2.0.3 ```

image

However, I don't see the problem in a fresh project without the _targets cache. Any ideas on that?

Alternatively, you could try reproducing it without targets: take everything from _targets.R except the target list and put it in a callr::r() call. Curious to see how slow that is.

callr::r(function() {
  library(targets)
  library(...)
  tar_option_set(...)
  future::plan(...)
  # other setup steps
})

I'll try that tomorrow and report back.

wlandau commented 3 years ago

However, I don't see the problem in a fresh project without the _targets cache. Any ideas on that?

That would seem like overhead due to a slow network drive, but the part about callr does not make sense.

wlandau commented 3 years ago

Could the initial mifid_files target have anything to do with the slowness? What exactly does it do?

matthiasgomolka commented 3 years ago

Could the initial mifid_files target have anything to do with the slowness? What exactly does it do?

That's the first target created by tar_files(). It contains a vector of ~3,600 file names. I thought about batching here, but it's actually not slow.

But I found out that the problem is related to renv. When I run the following code with activated renv, it takes ~ 150 seconds. After deactivating the renv, its 2 seconds.

callr::r(function() {
  library(targets)
})

And it's only slow, when it contains a library() call. So there is no problem with targets. Sorry for the false alarm and thanks for the hints regarding the debugging!

EDIT: It's only slow for specific packages.

matthiasgomolka commented 3 years ago

Just for future reference: I opened an issue for renv: https://github.com/rstudio/renv/issues/685

wlandau commented 1 year ago

In the manual, I just added more advice about the performance of renv in targets pipelines: https://books.ropensci.org/targets/packages.html#package-management-with-renv