rmaia / pavo

tools for the analysis of color data in R
http://pavo.colrverse.com
GNU General Public License v2.0
68 stars 17 forks source link

future.seed=TRUE to avoid warnings regarding RNG #211

Closed HenrikBengtsson closed 3 years ago

HenrikBengtsson commented 3 years ago

Hi, I just spotted the following warnings while revdep checking your package toward {future}:

UNRELIABLE VALUE: Future ('future_lapply-1') unexpectedly generated random numbers without specifying argum
       │ ent '[future.]seed'. There is a risk that those random numbers are not statistically sound and the overall 
       │ results might be invalid. To fix this, specify argument '[future.]seed', e.g. 'seed=TRUE'. This ensures tha
       │ t proper, parallel-safe random numbers are produced via the L'Ecuyer-CMRG method. To disable this check, us
       │ e [future.]seed=NULL, or set option 'future.rng.onMisuse' to "ignore".

The solution is to update things like:

    bootcd <- future_lapply(bootgrouped, function(z) {
      p()
      tryCatch(suppressMessages(tmpbootcdfoo(z)),
        error = function(e) NULL
      )
    })

to

    bootcd <- future_lapply(bootgrouped, function(z) {
      p()
      tryCatch(suppressMessages(tmpbootcdfoo(z)),
        error = function(e) NULL
      )
    }, future.seed = TRUE)

You can read more about this on https://www.jottr.org/2020/09/22/push-for-statical-sound-rng/

Bisaloo commented 3 years ago

Hi @HenrikBengtsson, thanks for reaching out! This was fixed in https://github.com/rmaia/pavo/commit/20acb5ba5930544ea5c31dd5b16ac56fadb01940 and https://github.com/rmaia/pavo/commit/ef2feccbd3c944cfb4aca832ef175e8f908e1dfa. But I guess we should make a new release before you do, right?

HenrikBengtsson commented 3 years ago

Thanks. I missed that; I quickly skimmed the NEWS file.

... I guess we should make a new release before you do, right?

Not a blocker per se - so no need. However, it just found another thing with pavo that could be a blocker for my next release, cf. https://github.com/HenrikBengtsson/future/issues/446#issuecomment-736758159. I'll try to investigate today and report back. My best guess without looking is that it's just some validation checks in the tests that need to be updated.

Bisaloo commented 3 years ago

My best guess without looking is that it's just some validation checks in the tests that need to be updated.

I had a look and I'd say so as well. We test against an expected value. But given that the RNG has changed, it's not surprising that this value changes as well (even with the same seed, as shown in https://github.com/HenrikBengtsson/future/issues/446#issue-751090009)

HenrikBengtsson commented 3 years ago

I think there are few more missing future.seed = TRUE. For example, with pavo commit d5607cc, I get:

library(pavo)
suppressWarnings(RNGversion("3.5.0")) # back compatibility for now
set.seed(2231)
papilio <- getimg(system.file("testdata/images/papilio.png", package = "pavo"))
papilio_class <- classify(papilio, kcols = 4)
Image classification in progress...
Warning message:                                                                                                   
UNRELIABLE VALUE: Future ('future_lapply-1') unexpectedly generated random numbers without specifying argument '[future.]seed'. There is a risk that those random numbers are not statistically sound and the overall results might be invalid. To fix this, specify argument '[future.]seed', e.g. 'seed=TRUE'. This ensures that proper, parallel-safe random numbers are produced via the L'Ecuyer-CMRG method. To disable this check, use [future.]seed=NULL, or set option 'future.rng.onMisuse' to "ignore". 

Rerunning with:

options(future.rng.onMisuse = "error")

reveals that:

> papilio_class <- classify(papilio, kcols = 4)
Image classification in progress...
Error: UNRELIABLE VALUE: Future ('future_lapply-1') unexpectedly generated random numbers without specifying argument '[future.]seed'. There is a risk that those random numbers are not statistically sound and the overall results might be invalid. To fix this, specify argument '[future.]seed', e.g. 'seed=TRUE'. This ensures that proper, parallel-safe random numbers are produced via the L'Ecuyer-CMRG method. To disable this check, use [future.]seed=NULL, or set option 'future.rng.onMisuse' to "ignore".

> traceback()
13: stop(condition)
12: signalConditions(obj, exclude = getOption("future.relay.immediate", 
        "immediateCondition"), resignal = resignal, ...)
11: signalConditionsASAP(obj, resignal = FALSE, pos = ii)
10: resolve.list(y, result = TRUE, stdout = stdout, signal = signal, 
        force = TRUE)
9: resolve(y, result = TRUE, stdout = stdout, signal = signal, force = TRUE)
8: value.list(fs)
7: value(fs)
6: future_xapply(FUN = FUN, nX = nX, chunk_args = X, args = list(...), 
       get_chunk = `[`, expr = expr, envir = envir, future.globals = future.globals, 
       future.packages = future.packages, future.scheduling = future.scheduling, 
       future.chunk.size = future.chunk.size, future.stdout = future.stdout, 
       future.conditions = future.conditions, future.seed = future.seed, 
       future.lazy = future.lazy, future.label = future.label, fcn_name = fcn_name, 
       args_name = args_name, debug = debug)
5: future_lapply(seq_along(imgdat_i2), function(x) {
       p()
       classify_main(imgdat_i2[[x]], n_cols_i2[[x]], method_i2)
   })
4: withCallingHandlers(expr, progression = function(p) {
       capture_conditions <<- FALSE
       on.exit(capture_conditions <<- TRUE)
       if (isTRUE(attr(delays$terminal, "flush"))) {
           if (length(conditions) > 0L || has_buffered_stdout(stdout_file)) {
               calling_handler(control_progression("hide"))
               stdout_file <<- flush_stdout(stdout_file, close = FALSE)
               conditions <<- flush_conditions(conditions)
               calling_handler(control_progression("unhide"))
           }
       }
       calling_handler(p)
   }, condition = function(c) {
       if (!capture_conditions || inherits(c, c("progression", "error"))) 
           return()
       if (inherits(c, delays$conditions)) {
           conditions[[length(conditions) + 1L]] <<- c
           if (inherits(c, "message")) {
               invokeRestart("muffleMessage")
           }
           else if (inherits(c, "warning")) {
               invokeRestart("muffleWarning")
           }
           else if (inherits(c, "condition")) {
               restarts <- computeRestarts(c)
               for (restart in restarts) {
                   name <- restart$name
                   if (is.null(name)) 
                     next
                   if (!grepl("^muffle", name)) 
                     next
                   invokeRestart(restart)
                   break
               }
           }
       }
   })
3: with_progress({
       p <- progressor(along = imgdat_i2)
       outdata <- future_lapply(seq_along(imgdat_i2), function(x) {
           p()
           classify_main(imgdat_i2[[x]], n_cols_i2[[x]], method_i2)
       })
   })
2: classifier(imgdat, ref_centers, method2)
1: classify(papilio, kcols = 4)
HenrikBengtsson commented 3 years ago

FYI, if you set environment variable:

export R_FUTURE_RNG_ONMISUSE=error

you can upgrade those RNG warnings to errors, which means you can easily detect them when running R CMD check.