Closed shikokuchuo closed 1 year ago
Thanks for the suggestion. Your comments nudged me to profile crew_eval()
, and recent commits make it faster. Now if I run the following small profiling study:
library(crew)
library(proffer)
command <- quote(a + b)
data <- list(a = 1)
globals <- list(b = 2)
pprof(
replicate(
1e3,
crew_eval(command = command, data = data, globals = globals)
)
)
the flame graph looks like this:
So it looks like the bottleneck now is indeed setting and restoring global options. I think relying on mirai
for global options specifically would be a demonstrable performance gain, and if feasible, I will have crew
always leverage this. After that, it would be nice to let the user separately choose whether to unload packages and whether to call GC, that would simplify the signatures of both crew_eval()
and controller$push()
.
If you go forward with the bitmask approach for cleanup, I will add logical arguments garbage_collection
and unload_packages
to the crew
launcher abstract class and then create the value of the bitmask internally.
I also wonder if you think there may be a faster/native way to restore the RNG state after each task. Currently I am using withr::local_seed()
, which is robust, but it shows up on the profiling studies. If I could drop it, then it could be a bit of a performance boost, and I could drop the dependency on withr
.
If you go forward with the bitmask approach for cleanup, I will add logical arguments
garbage_collection
andunload_packages
to thecrew
launcher abstract class and then create the value of the bitmask internally.
Implemented in mirai 0.8.2.9019. It's a simple additive bitmask, you simply assign the options values 1,2,4,8 and sum them.
I am not entirely sure this works for you as resetting options without unloading packages could be an issue - mirai
can only set the options to an initial state before any user packages have been loaded.
Nevertheless, it doesn't hurt to offer the flexibility - there is enough of a disclaimer in the docs.
I also wonder if you think there may be a faster/native way to restore the RNG state after each task. Currently I am using
withr::local_seed()
, which is robust, but it shows up on the profiling studies. If I could drop it, then it could be a bit of a performance boost, and I could drop the dependency onwithr
.
I've had a look at your use of withr::local_seed()
and I am wondering why you need to restore the RNG state after each eval. Otherwise you would be able to just use set.seed()
.
The default behaviour of mirai
is to actually change the RNG state after each eval as it removes .Random.seed
as part of the globals cleanup.
FYI, be careful when resetting R options.
There are cases where removing an option wreaks havoc. For example, in R (< 4.3.0), there were several built-in R options that could not be reset (https://bugs.r-project.org/show_bug.cgi?id=18372). There are also packages out there that rely on their options not being removed, e.g. https://github.com/truecluster/ff/issues/14.
This is why I rolled back automatic setting of R options in future a while back (https://github.com/HenrikBengtsson/future/blob/f7c11acb4d2260e882dcd3b392b905c82709d5ae/R/expressions.R#L32-L50) - the risk of breaking things is simply too great.
PS. It feels like I've already mentioned this at some point, but in case I just made that memory up, I'm posting this comment.
Thanks for the note and the examples. Thankfully @shikokuchuo also warned me about this and so crew
has a note in the docs.
I am not sure built-in base options would be affected because the original state should already have the right values, but maybe I am missing something.
From a package development perspective, it seems like good practice to gracefully handle missing options (or not rely on them at all; targets::tar_option_set()
modifies an internal mutable object in the package, and tar_option_get()
handles nulls). But I do recognize that many packages do this anyway and it is out of my control.
I am not sure built-in base options would be affected because the original state should already have the right values, but maybe I am missing something.
Correct, but only since R 4.3.0.
Also just to note on this, options cleanup in mirai
is light-touch. It only reverts the options that are present at the beginning of the R session. So if a package adds additional options that it relies on, those are actually not removed as part of the cleanup, reducing the chances of breakages in this respect even if the note is not heeded.
The implied assumption is that any new options set by packages would be specific to that package - which again isn't enforceable, but should be true in general - any conflict would be no different to if both namespaces were loaded at the same time, so I contend nothing to be done specifically here.
Also, it should not run into the R (< 4.3.0) bug as it would not be setting any options to NULL.
Prework
Description
The initial state is obtained and then reset after each task in your crew evaluation wrapper.
Getting the initial state each time is not required.
More efficient solution is to let mirai get this once per daemon and do the cleanup after each task.
I know I was not keen on being able to pick and choose, but I am prepared to offer flexibility for each of the above through the cleanup argument being an integer number (operating a bitmask). This will make it too obscure for most users rather than drawing attention to different cleanup options, which was my initial concern.
Are the above the only operations you need?