Closed HenrikBengtsson closed 1 year ago
I like the attribute for the cl
argument, but it might be a bit alien for some users. How about adding it to pboptions()
? I.e. have it unset (NULL
) on load, but check for the existence of the future.seed
option and use that value.
How about adding it to pboptions()?
This is something the developer should control in their code. I don't think it should be modifiable by the end-user via an option - that'll give different results depending on option, which probably is not what the developer intended.
I see the distinction. If the user is calling pb*apply(..., cl = "future")
they should be able to set it as attribute, but if this is being used as part of another package, it is baked in.
One can pass the future.seed
argument directly through ...
because ?future.apply::future_lapply
tells:
For
future_*apply()
functions andreplicate()
, anyfuture.*
arguments part of \dots are passed on tofuture_lapply()
used internally.
See:
r$> y <- pblapply(1:2, FUN = rnorm, cl = "future")
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=00s
Warning messages:
1: UNRELIABLE VALUE: One of the ‘future.apply’ iterations (‘future_lapply-1’) unexpectedly generated random numbers without declaring so. There is a risk that those random numbers are not statistically sound and the overall results might be invalid. To fix this, specify 'future.seed=TRUE'. This ensures that proper, parallel-safe random numbers are produced via the L'Ecuyer-CMRG method. To disable this check, use 'future.seed = NULL', or set option 'future.rng.onMisuse' to "ignore".
2: UNRELIABLE VALUE: One of the ‘future.apply’ iterations (‘future_lapply-2’) unexpectedly generated random numbers without declaring so. There is a risk that those random numbers are not statistically sound and the overall results might be invalid. To fix this, specify 'future.seed=TRUE'. This ensures that proper, parallel-safe random numbers are produced via the L'Ecuyer-CMRG method. To disable this check, use 'future.seed = NULL', or set option 'future.rng.onMisuse' to "ignore".
r$> y <- pblapply(1:2, FUN = rnorm, cl = "future", future.seed = TRUE)
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=00s
# no warnings
So developers can utilize this behaviour to set the future seed.
So developers can utilize this behaviour to set the future seed.
Good point. Yes, that looks like the cleanest solution. Then a rule of thumb can be to "pass any additional arguments to FUN
immediately following the FUN
argument, and any additional arguments to the the futureverse after cl = "future"
;
y <- pblapply(1:2, FUN = my_fcn, {additional my_fcn args}, cl = "future", {additional future args})
To prevent non-sound random numbers being produced when running in parallel, futureverse asks the developer to specify when their code needs the RNG. If not asked for, it'll still check to see if the RNG was used (i.e.
.Random.seed
) was updated. If it was, then a warning is produced.Here is an example:
To avoid this, a quick fix is for you could always pass
future.seed = TRUE
. That will set up a parallel RNG regardless of random numbers being generated or not. The downside is that it can be computationally expensive to do so. To give the developer the control, you'd have to introduce a new argument allowing the to control thefuture.seed
argument tofuture_lapply()
and likes. One way to do that without adding a new argument could be via attributes, e.g.