r-lib / memoise

Easy memoisation for R
https://memoise.r-lib.org
Other
317 stars 56 forks source link

Debug mode? #129

Open DarwinAwardWinner opened 3 years ago

DarwinAwardWinner commented 3 years ago

I've recently been writing a bunch of code using memoise to let me "skip" long computation steps on subsequent runs. However, in the process I have run into repeated issues where the arguments to my memoised functions were not identical on successive runs of the same code (and that code doesn't do RNG), and debugging these issues has been a challenge. It would be nice if there was a "verbose mode" that could at least show the hashes of individual arguments and the final hash computed as the key to look up in the cache, so I can at least figure out which argument is changing unexpectedly between runs. In addition, I've found the following functions useful for debugging caching issues, and they might be worth including in the package (with some cleanup, obviously):

# Return TRUE if FUN has a memoised result for calling on these args
is_cached <- function(FUN, ...) {
    args <- list(...)
    if (!is.memoised(FUN)) {
        return(FALSE)
    }
    test_memo_f <- function(...) {
        mc <- match.call()
        encl <- parent.env(environment())
        called_args <- as.list(mc)[-1]
        default_args <- encl$`_default_args`
        default_args <- default_args[setdiff(names(default_args),
                                             names(called_args))]
        called_args[encl$`_omit_args`] <- NULL
        args <- c(
            lapply(called_args, eval, parent.frame()),
            lapply(default_args, eval, envir = environment())
        )
        key <- encl$`_hash`(c(
            encl$`_f_hash`,
            args,
            lapply(encl$`_additional`,
                   function(x) eval(x[[2L]], environment(x)))))
        res <- encl$`_cache`$exists(key)
    }
    formals(test_memo_f) <- formals(FUN)
    environment(test_memo_f) <- environment(FUN)
    do.call(test_memo_f, args)
}

# Like do.call but throws an error if the call is not memoised
do_call_memo_only <- function(FUN, ...) {
    args <- list(...)
    if (!is.memoised(FUN)) {
        stop("Function is not memoised")
    }
    memo_only_f <- function(...) {
        mc <- match.call()
        encl <- parent.env(environment())
        called_args <- as.list(mc)[-1]
        default_args <- encl$`_default_args`
        default_args <- default_args[setdiff(names(default_args),
                                             names(called_args))]
        called_args[encl$`_omit_args`] <- NULL
        args <- c(
            lapply(called_args, eval, parent.frame()),
            lapply(default_args, eval, envir = environment())
        )
        key <- encl$`_hash`(c(
            encl$`_f_hash`,
            args,
            lapply(encl$`_additional`,
                   function(x) eval(x[[2L]], environment(x)))))
        res <- encl$`_cache`$get(key)
        if (inherits(res, "key_missing")) {
            stop("Call is not cached")
        }
        if (res$visible) {
            res$value
        }
        else {
            invisible(res$value)
        }
    }
    formals(memo_only_f) <- formals(FUN)
    environment(test_memo_f) <- environment(FUN)
    do.call(memo_only_f, args)
}
allefeld commented 1 month ago

I have the same problem. On repeated calls with the same arguments, the hash seems to change. I've already set envir = NULL, but that doesn't help.

So I tried to debug the hash computation, to see where the apparent difference comes in. I defined a custom hash function

mhash <- function(x) {
  print(str(x))
  h <- rlang::hash(x)
  print(h)
  h
}

and then used it.

msimulation <- memoize(
  simulation,
  cache = cache_filesystem("cache"),
  envir = NULL,
  hash = mhash
)

My function seems to be never called, or at least the printed information doesn't make it to stdout.

In case the output from mhash is suppressed, I used sink:

mhash <- function(x) {
  sink("mhash.log")
  print(str(x))
  h <- rlang::hash(x)
  print(h)
  sink()
  h
}

Since the file is never created, I assume that mhash is never called.

allefeld commented 1 month ago

Reading the code, I now found that for "old-style" caches incl. cache_filesystem, the passed value of hash is ignored.

The README mentions old-style caches, but not this side effect.

Anyway print-style debugging is possible by specifying hash, as long as the cache is taken from cachem.

msimulation <- memoize(
  simulation,
  cache = cachem::cache_disk("cache"),
  envir = NULL,
  hash = mhash
)