Closed matthiasgomolka closed 3 years ago
I noticed a similar error as the one from proffer
when calling tar_watch()
. This made me think if there are some global settings in my Windows environment (I don't have admin rights) which which cause problems and lead to some sort of retries which might be the cause for tar_make()
to be so slow.
> tar_watch()
Error in rethrow_call(c_processx_exec, command, c(command, args), stdin, :
create process 'U:/svn/FDSZ/01_data_production/01_data_preparation/mifid/oc_analysis/renv/library/R-4.0/x86_64-w64-mingw32/processx/bin/x64/supervisor.exe' (system error 1260, Dieses Programm wurde durch eine Gruppenrichtlinie geblockt. Wenden Sie sich an den Systemadministrator, um weitere Informationen zu erhalten.
) @win/processx.c:1042 (processx_exec)
As you might have noticed, I don't really know anything about this.
Would you post a reprex of just the initialization part? I think the slowness should be reproducible if you remove the target list and keep everything else from _targets.R
.
# _targets.R
library(targets)
library(...)
tar_option_set(...)
future::plan(...)
# other setup steps
list() # blank target list
From some of your profiling output, it looks like you may be using future
. What happens if you choose the sequential plan or future.callr::callr
instead of the PSOCK-based multisession plan?
Alternatively, you could try reproducing it without targets
: take everything from _targets.R
except the target list and put it in a callr::r()
call. Curious to see how slow that is.
callr::r(function() {
library(targets)
library(...)
tar_option_set(...)
future::plan(...)
# other setup steps
})
In my experience, the initialization bottlenecks are almost always loading packages and initializing PSOCK clusters for future
. There may be a delay due to the callr
process, but that should be no more than a second or two.
From some of your profiling output, it looks like you may be using
future
. What happens if you choose the sequential plan orfuture.callr::callr
instead of the PSOCK-based multisession plan?
Setting plan("sequential")
does not change anything.
Also, stripping _targets.R
down to
library(targets)
list()
does not change anything:
library(targets)
#> Warning: package 'targets' was built under R version 4.0.4
tar_script({
list()
})
# new session
system.time(tar_make())
#> * end pipeline
#> Warning message:
#> package 'targets' was built under R version 4.0.4
#> user system elapsed
#> 0.35 0.08 165.46
# same session
system.time(tar_make(callr_function = NULL))
#> * end pipeline
#> user system elapsed
#> 0.07 0.02 0.10
Created on 2021-03-18 by the reprex package (v0.3.0)
However, I don't see the problem in a fresh project without the _targets
cache. Any ideas on that?
Alternatively, you could try reproducing it without
targets
: take everything from_targets.R
except the target list and put it in acallr::r()
call. Curious to see how slow that is.callr::r(function() { library(targets) library(...) tar_option_set(...) future::plan(...) # other setup steps })
I'll try that tomorrow and report back.
However, I don't see the problem in a fresh project without the _targets cache. Any ideas on that?
That would seem like overhead due to a slow network drive, but the part about callr
does not make sense.
Could the initial mifid_files
target have anything to do with the slowness? What exactly does it do?
Could the initial
mifid_files
target have anything to do with the slowness? What exactly does it do?
That's the first target created by tar_files()
. It contains a vector of ~3,600 file names. I thought about batching here, but it's actually not slow.
But I found out that the problem is related to renv
. When I run the following code with activated renv
, it takes ~ 150 seconds. After deactivating the renv
, its 2 seconds.
callr::r(function() {
library(targets)
})
And it's only slow, when it contains a library()
call. So there is no problem with targets
. Sorry for the false alarm and thanks for the hints regarding the debugging!
EDIT: It's only slow for specific packages.
Just for future reference: I opened an issue for renv
: https://github.com/rstudio/renv/issues/685
In the manual, I just added more advice about the performance of renv
in targets
pipelines: https://books.ropensci.org/targets/packages.html#package-management-with-renv
Prework
Description
tar_make()
takes a lot of time to start up (several minutes). In comparison,tar_make(callr_function = NULL)
starts almost without delay. See the benchmarks.Reproducible example
Since I'm working with confidential data, creating a reprex is quite some work. Therefore, I hope the benchmarks already help. If they don't I will take the time to create a reprex.
Benchmarks
I tried to use
proffer
, but I got some errors. Also, as far as I understand,proffer
should only work withcallr_function = NULL
.Thus, I measured the time for
tar_make()
withsystem.time()
:I also profiled using
profvis
. Not sure if this helps, but see for yourself.For
profvis::profvis(tar_make(callr_function = NULL))
:And for
profvis::profvis(tar_make())
:So it seems as if starting a new R session for
tar_make()
takes a lot of time, which I don't understand.Am I missing something obvious here?
Also, as written before, please say if you need a real reprex and I'll try to create one.