richfitz / storr

:package: Object cacher for R
http://richfitz.github.io/storr
Other
116 stars 10 forks source link

Corrupted storr when killed while writing data #55

Closed violetcereza closed 6 years ago

violetcereza commented 7 years ago

Hi -- I use storr through drake, so I don't actually know a lot about the internals. However, I've been having a problem with drake that @wlandau wanted to bring over here.

When I make things with drake, drake is saves large .rds files (6GB?) to the hard drive. Sometimes, I want to kill the drake process that is doing the making, so I can unblock my R session and modify things. Sometimes, when I kill this process, the cache ends up a weird corrupted state that is difficult to recover from without starting from scratch. I get this error:

Error in readRDS(self$name_hash(hash)) : error reading from connection

With a traceback: (which includes drake code in lines >= 7)

13: readRDS(self$name_hash(hash))
12: self$driver$get_object(hash)
11: self$get_value(self$get_hash(key, namespace), use_cache)
10: cache$get(target)
9: (function (target, cache) 
   {
       if (!(target %in% cache$list())) 
           return(FALSE)
       cache$get(target)$imported
   })(target = dots[[1L]][[1L]], cache = <environment>)
8: mapply(FUN = function (target, cache) 
   {
       if (!(target %in% cache$list())) 
           return(FALSE)
       cache$get(target)$imported
   }, target = "premiers_data_raw_sept", MoreArgs = list(cache = <environment>), 
       SIMPLIFY = TRUE, USE.NAMES = TRUE)
7: do.call("mapply", c(FUN = FUN, args[dovec], MoreArgs = list(args[!dovec]), 
       SIMPLIFY = SIMPLIFY, USE.NAMES = USE.NAMES))
6: is_imported(target = target, cache = cache)
5: (function (target, cache) 
   {
       if (is.null(cache)) {
           return()
       }
       if (is_file(target) & !is_imported(target = target, cache = cache)) {
           unquote(target) %>% unlink(recursive = TRUE, force = TRUE)
       }
       default <- cache$default_namespace
       for (space in c(default, "depends", "filemtime", "functions")) if (target %in% 
           cache$list(namespace = space)) {
           cache$del(target, namespace = space)
       }
       invisible()
   })(target = dots[[1L]][[1L]], cache = <environment>)
4: mapply(FUN = function (target, cache) 
   {
       if (is.null(cache)) {
           return()
       }
       if (is_file(target) & !is_imported(target = target, cache = cache)) {
           unquote(target) %>% unlink(recursive = TRUE, force = TRUE)
       }
       default <- cache$default_namespace
       for (space in c(default, "depends", "filemtime", "functions")) if (target %in% 
           cache$list(namespace = space)) {
           cache$del(target, namespace = space)
       }
       invisible()
   }, target = "premiers_data_raw_sept", MoreArgs = list(cache = <environment>), 
       SIMPLIFY = TRUE, USE.NAMES = TRUE)
3: do.call("mapply", c(FUN = FUN, args[dovec], MoreArgs = list(args[!dovec]), 
       SIMPLIFY = SIMPLIFY, USE.NAMES = USE.NAMES))
2: uncache(targets, cache = cache)
1: clean(premiers_data_raw_sept)

@wlandau was envisioning something like cache$repair(), which I assume would search for and remove corrupted rds files. I have tried to do this by hand, but I'm afraid of messing up storr's key <-> hash <-> value setup and breaking things further.

fmichonneau commented 7 years ago

This is a function I hacked to recover a corrupted store. I'd be curious to know if there are better ways to implement something like this.

rescue_store <- function(store) {
    all_content <- store$list()
    is_missing <- vapply(all_content,
                         function(x) {
        res <- try(store$get(x), silent = TRUE)
        inherits(res, "try-error")
    }, logical(1))
    message(sum(is_missing), " records need to be rescued...")

    to_redo <- all_content[is_missing]
    sapply(to_redo, function(x) store$del(x))
    sapply(to_redo, function(x) store$get(x))
}
wlandau-lilly commented 7 years ago

@fmichonneau I like it. I think it is close, but it may need to contemplate all the available namespaces.

fmichonneau commented 7 years ago

yes, good point. I only use the default namespace generally. This also assumes there is a fetch_hook associated with the get method.

wlandau-lilly commented 7 years ago

In wlandau-lilly/drake#155, @kendonB noticed that rescue_cache() is hogging memory, a problem I thought I had solved. Does storr keep in memory objects loaded with $get()? I am currently using:

rescue_cache <- function(
  targets = NULL,
  path = getwd(), search = TRUE, verbose = TRUE, force = FALSE,
  cache = drake::get_cache(
    path = path, search = search, verbose = verbose, force = force),
  jobs = 1
){
  if (is.null(cache)){
    return(invisible())
  }
  for (namespace in cache$list_namespaces()){
    X <- cache$list(namespace = namespace)
    if (!is.null(targets)){
      X <- intersect(X, targets)
    }
    lightly_parallelize(
      X = X,
      FUN = rescue_del,
      jobs = jobs,
      cache = cache,
      namespace = namespace
    )
  }
  invisible(cache)
}

rescue_del <- function(key, cache, namespace){
  tryCatch(
    cache$get(key = key, namespace = namespace),
    error = function(e){
      cache$del(key = key, namespace = namespace)
    }
  )
  invisible()
}

with some supporting functions:

lightly_parallelize <- function(X, FUN, jobs = 1, ...) {
  jobs <- safe_jobs(jobs)
  if (is.atomic(X)){
    lightly_parallelize_atomic(X = X, FUN = FUN, jobs = jobs, ...)
  } else {
    mclapply(X = X, FUN = FUN, mc.cores = jobs, ...)
  }
}

lightly_parallelize_atomic <- function(X, FUN, jobs = 1, ...){
  jobs <- safe_jobs(jobs)
  keys <- unique(X)
  index <- match(X, keys)
  values <- mclapply(X = keys, FUN = FUN, mc.cores = jobs, ...)
  values[index]
}

safe_jobs <- function(jobs){
  ifelse(on_windows(), 1, jobs)
}

on_windows <- function(){
  this_os() == "windows"
}
wlandau-lilly commented 7 years ago

Update: from wlandau-lilly/drake#155, this seems to avoid consuming too much memory:

rescue_del <- function(key, cache, namespace){
  tryCatch(
    touch_storr_object(key = key, cache = cache, namespace = namespace),
    error = function(e){
      cache$del(key = key, namespace = namespace)
    }
  )
  invisible(NULL)
}

touch_storr_object <- function(key, cache, namespace){
  envir <- environment()
  hash <- cache$get_hash(key = key, namespace = namespace)
  value <- cache$driver$get_object(hash = hash)
  remove(value, envir = envir)
  invisible(NULL)
}

As @kendonB pointed out, this could be related to a long-standing memory efficiency issue in mclapply(). On the other hand, it could that storr's $get() method puts things in the cache object's environment. Part of my solution for drake was just avoiding that part of storr's internals rather than using $get() directly.

kendonB commented 7 years ago

I wonder if there's a way for storr to automatically delete objects that it notices are corrupted? Something like drake:::rescue_del but it would run any time readRDS in $get fails with Error in readRDS(self$name_hash(hash)) : error reading from connection.

richfitz commented 6 years ago

Sorry - I get so much noise from various git repos I have missed this. Please ping me if I don't respond in a reasonable amount of time as they tend to come through as emails still...

It should be possible to detect an invalid object without unserialising it by looking at its hash (I think). If someone can cook up a reliable way of creating a corrupted storr (so that we can test this) I'd happily add a repair function to the package

krlmlr commented 6 years ago

@richfitz: I have a corrupted storr from a drake run, but it's > 500 MB compressed. Download from Dropbox, as far as I can tell the meta namespace is corrupted. A repair() or fsck() method would be very nice indeed.

Unserializing: Maybe still helpful to detect failures where the data has been written partially? Perhaps with an option fsck(full = FALSE) ?

richfitz commented 6 years ago

Oh great (I mean, not that great obviously!).

Or, as a much simpler first step, just rip out the data directory and send me the keys and a list of the files in the data directory. If that's not enough we can try for transferring the whole thing.

If you could also provide the command that shows the corrupted data (ideally separate from drake - just the storr calls, but I can dig them out if not). I can just create dummy data in the data dir because if that's not where the corruption is I can just fill it with junk.

If we need to move the whole lot around, can you can dropbox or google drive it to me perhaps? Or even burn it to CD/pop it onto a USB stick and post it if that's too much of a faff.

krlmlr commented 6 years ago

Can you please download the data from Dropbox: https://www.dropbox.com/s/1yrsbj3cwzc8jhh/drake.tar.gz?dl=0

I'm seeing this in my copy:

> storr <- storr::storr_rds(".drake")
> storr$exists("c_pd.hu6800_new", namespace = "meta")
[1] TRUE
> storr$get("c_pd.hu6800_new", namespace = "meta")
Error in if (use_cache && exists0(hash, envir)) { : 
  missing value where TRUE/FALSE needed
3: traceback(1)
2: self$get_value(self$get_hash(key, namespace), use_cache) at storr.R#143
1: storr$get("c_pd.hu6800_new", namespace = "meta")
krlmlr commented 6 years ago

I also saw quite a few other invalid objects in the "meta" namespace:

 [1] "c_pd.hugene.1.0.st.v1_old" "c_pd.mapping250k.nsp_new"
 [3] "c_pd.mirna.1.0_new"        "c_pd.mirna.2.0_new"
 [5] "c_pd.mogene.1.0.st.v1_new" "c_pd.mogene.1.1.st.v1_new"
 [7] "c_pd.mogene.2.0.st_new"    "c_pd.mogene.2.0.st_old"
 [9] "c_pd.mu11ksubb_new"        "c_pd.nugo.mm1a520177_old"
[11] "c_pd.poplar_old"           "c_pd.porcine_new"
[13] "c_pd.porcine_old"          "c_pd.porgene.1.0.st_new"
[15] "c_pd.rabgene.1.0.st_old"   "c_pd.raex.1.0.st.v1_old"
[17] "c_pd.ragene.1.0.st.v1_new" "c_pd.ragene.1.0.st.v1_old"
[19] "c_pd.rcngene.1.1.st_new"   "c_pd.rg.u34b_old"
[21] "c_pd.rg.u34c_new"          "c_pd.rhegene.1.1.st_new"
richfitz commented 6 years ago

Thanks! Downloading it now. Can you provide any information about how the corruption happened?

krlmlr commented 6 years ago

Running a fairly complex drake plan with ~5500 targets on 16 cores using staged parallelism (the default setting with jobs = 16 and keep_going = TRUE). I did cancel and restart a few times.

richfitz commented 6 years ago

Can confirm the situation above

I see from ls -l

-rw-r--r--@ 1 rich  staff  17 22 Feb 11:46 Y19wZC5kcm9nZW5lLjEuMC5zdF9uZXc
-rw-r--r--@ 1 rich  staff   0 22 Feb 11:46 Y19wZC5tYXBwaW5nMjUway5uc3BfbmV3
-rw-r--r--@ 1 rich  staff   0 22 Feb 11:46 Y19wZC5wb3JjaW5lX25ldw
-rw-r--r--@ 1 rich  staff   0 22 Feb 11:46 Y19wZC5taXJuYS4yLjBfbmV3
-rw-r--r--@ 1 rich  staff   0 22 Feb 11:46 Y19wZC5wb3JjaW5lX29sZA
-rw-r--r--@ 1 rich  staff  17 22 Feb 11:46 Y19wZC5taXJuYS4zLjBfbmV3
-rw-r--r--@ 1 rich  staff   0 22 Feb 11:46 Y19wZC5yYWdlbmUuMS4wLnN0LnYxX25ldw
-rw-r--r--@ 1 rich  staff   0 22 Feb 11:46 Y19wZC5yYWV4LjEuMC5zdC52MV9vbGQ
-rw-r--r--@ 1 rich  staff   0 22 Feb 11:46 Y19wZC5tb2dlbmUuMi4wLnN0X29sZA
-rw-r--r--@ 1 rich  staff   0 22 Feb 11:46 Y19wZC5tb2dlbmUuMi4wLnN0X25ldw
-rw-r--r--@ 1 rich  staff  17 22 Feb 11:46 Y19wZC5tb2dlbmUuMi4xLnN0X29sZA
-rw-r--r--@ 1 rich  staff  17 22 Feb 11:46 Y19wZC5nZW5vbWV3aWRlc25wLjZfb2xk
-rw-r--r--@ 1 rich  staff   0 22 Feb 11:46 Y19wZC5yaGVnZW5lLjEuMS5zdF9uZXc
-rw-r--r--@ 1 rich  staff   0 22 Feb 11:46 Y19wZC5yYWdlbmUuMS4wLnN0LnYxX29sZA
-rw-r--r--@ 1 rich  staff   0 22 Feb 11:46 Y19wZC5tdTExa3N1YmJfbmV3
-rw-r--r--@ 1 rich  staff  17 22 Feb 11:46 Y19wZC5tZy51NzRhdjJfb2xk
-rw-r--r--@ 1 rich  staff   0 22 Feb 11:46 Y19wZC5odTY4MDBfbmV3
-rw-r--r--@ 1 rich  staff   0 22 Feb 11:46 Y19wZC5yY25nZW5lLjEuMS5zdF9uZXc
-rw-r--r--@ 1 rich  staff   0 22 Feb 11:46 Y19wZC5wb3BsYXJfb2xk
-rw-r--r--@ 1 rich  staff  17 22 Feb 11:46 Y19wZC5vdmlnZW5lLjEuMC5zdF9vbGQ
-rw-r--r--@ 1 rich  staff  17 22 Feb 11:46 Y19wZC5taXJuYS4zLjBfb2xk
-rw-r--r--@ 1 rich  staff   0 22 Feb 11:46 Y19wZC5taXJuYS4xLjBfbmV3
-rw-r--r--@ 1 rich  staff   0 22 Feb 11:46 Y19wZC5wb3JnZW5lLjEuMC5zdF9uZXc
-rw-r--r--@ 1 rich  staff   0 22 Feb 11:46 Y19wZC5yZy51MzRiX29sZA
-rw-r--r--@ 1 rich  staff  17 22 Feb 11:46 Y19wZC5wb3JnZW5lLjEuMS5zdF9uZXc
-rw-r--r--@ 1 rich  staff   0 22 Feb 11:46 Y19wZC5yYWJnZW5lLjEuMC5zdF9vbGQ
-rw-r--r--@ 1 rich  staff   0 22 Feb 11:46 Y19wZC5tb2dlbmUuMS4xLnN0LnYxX25ldw
-rw-r--r--@ 1 rich  staff   0 22 Feb 11:46 Y19wZC5odWdlbmUuMS4wLnN0LnYxX29sZA
-rw-r--r--@ 1 rich  staff  17 22 Feb 11:46 Y19odGEyMHByb2Jlc2V0LmRiX25ldw
-rw-r--r--@ 1 rich  staff   0 22 Feb 11:46 Y19wZC5yZy51MzRjX25ldw
-rw-r--r--@ 1 rich  staff  17 22 Feb 11:46 Y19wZC5wb3JnZW5lLjEuMS5zdF9vbGQ
-rw-r--r--@ 1 rich  staff   0 22 Feb 11:46 Y19wZC5udWdvLm1tMWE1MjAxNzdfb2xk
-rw-r--r--@ 1 rich  staff   0 22 Feb 11:46 Y19wZC5tb2dlbmUuMS4wLnN0LnYxX25ldw

All the files that were corrupted have the timestamp of 11:46 - I guess the process died at that point? It's really surprising that multiple files have been written out like this. I didn't know that buffers would not have been flushed once the file was closed, and I don't see how >1 file can be open at once.

The longer term solution is probably trying (or giving the option) to move the key storage into something like thor which promises to be crash proof and should work ok on non-networked filesystems. I'll get a check function written up.

I will check that all the rds files are ok, but they are written then moved so they should be ok

richfitz commented 6 years ago

Running a fairly complex drake plan with ~5500 targets on 16 cores using staged parallelism (the default setting with jobs = 16 and keep_going = TRUE). I did cancel and restart a few times.

So cancelling will have broken the pipe for the parallel jobs and caused an ungraceful R shutdown perhaps? Still odd that there's 23 files broken.

The rds storr is just using writeLine (wrapped in tryCatch) for the write:

write_lines <- function(text, filename, ...) {
  withCallingHandlers(
    try_write_lines(text, filename, ...),
    error = function(e) unlink(filename))
}

try_write_lines <- function(text, filename, ...) {
  if (file.exists(filename)) {
    file.remove(filename)
  }
  writeLines(text, filename, ...)
}

I can shift that over to write-and-move perhaps which will make it more robust to these crashes

krlmlr commented 6 years ago

Yes, the parallel package doesn't seem to handle cancellation very well. Write+move sounds like a safer approach here. Still, a fsck(full = FALSE) would be useful.

richfitz commented 6 years ago

Yeah, I'll do both. This is very useful, thanks

krlmlr commented 6 years ago

Thanks for your fast response!

krlmlr commented 6 years ago

To your question: I think mclapply() is involved for parallelization, I'm not sure about its Ctrl+C behavior w.r.t. closing files.

richfitz commented 6 years ago

@wlandau-lilly - can I ask you to copy https://github.com/richfitz/storr/issues/55#issuecomment-344623527 and subsequent into a new issue if this is a thing you still will need after this; this is something that can be done without using internals (though what you're doing is API)

richfitz commented 6 years ago
> st$check(full = TRUE)
Checking keys
...found 49 corrupt keys in 4 namespaces
Checking objects
...found 33 corrupt objects                                                   
$objects
 [1] "032cb72bfbef5919.rds" "03b5001e4bfde331.rds" "04605c2788e34ea7.rds"
 [4] "132426c6372c5934.rds" "1884b3945606ea38.rds" "22ce45171ed259d0.rds"
 [7] "2b7cab4b76ed39e4.rds" "5152e8f8cea24790.rds" "5347239242139dee.rds"
[10] "55c86dea7655dbe0.rds" "5677369ed43c2c41.rds" "5b141773662d4e07.rds"
[13] "60286919782a674d.rds" "611267e2a79d2e5a.rds" "625874af26d505a8.rds"
[16] "655d97d5f5f5412e.rds" "7d04c39acaff70b4.rds" "8620edbcaa295a96.rds"
[19] "8c489002d5255cd6.rds" "8c4c789dbaa4a5af.rds" "9d48ff457cbc3a35.rds"
[22] "9f34f768c9a250ef.rds" "a9e1104a1b6bde94.rds" "ae77517242b923b1.rds"
[25] "b01cd6733740a7fc.rds" "b4d6175fbe92e370.rds" "b5dc6eb20a78e526.rds"
[28] "cad54bed8031af0f.rds" "cda6eaef01771a59.rds" "cf5469e55f47d99f.rds"
[31] "db0d40f5c73c9297.rds" "e00cea7cbe0f41e1.rds" "fe70d9c9158d9dd7.rds"

$keys
$keys$kernels
[1] "Y19wZC5hdGRzY2hpcC50aWxpbmdfbmV3"   "Y19wZC5odTY4MDBfbmV3"              
[3] "Y19wZC5taXJuYS4xLjBfbmV3"           "Y19wZC5taXJuYS4yLjBfbmV3"          
[5] "Y19wZC5tb2dlbmUuMi4wLnN0X25ldw"     "Y19wZC5tb2dlbmUuMi4wLnN0X29sZA"    
[7] "Y19wZC5tb2dlbmUuMS4xLnN0LnYxX25ldw" "Y19wZC5yYWdlbmUuMS4wLnN0LnYxX29sZA"

$keys$meta
 [1] "Y19wZC5odTY4MDBfbmV3"               "Y19wZC5odWdlbmUuMS4wLnN0LnYxX29sZA"
 [3] "Y19wZC5taXJuYS4xLjBfbmV3"           "Y19wZC5taXJuYS4yLjBfbmV3"          
 [5] "Y19wZC5tb2dlbmUuMi4wLnN0X25ldw"     "Y19wZC5tb2dlbmUuMi4wLnN0X29sZA"    
 [7] "Y19wZC5tb2dlbmUuMS4wLnN0LnYxX25ldw" "Y19wZC5tb2dlbmUuMS4xLnN0LnYxX25ldw"
 [9] "Y19wZC5tdTExa3N1YmJfbmV3"           "Y19wZC5tYXBwaW5nMjUway5uc3BfbmV3"  
[11] "Y19wZC5udWdvLm1tMWE1MjAxNzdfb2xk"   "Y19wZC5wb3BsYXJfb2xk"              
[13] "Y19wZC5wb3JjaW5lX25ldw"             "Y19wZC5wb3JjaW5lX29sZA"            
[15] "Y19wZC5wb3JnZW5lLjEuMC5zdF9uZXc"    "Y19wZC5yaGVnZW5lLjEuMS5zdF9uZXc"   
[17] "Y19wZC5yY25nZW5lLjEuMS5zdF9uZXc"    "Y19wZC5yYWdlbmUuMS4wLnN0LnYxX25ldw"
[19] "Y19wZC5yYWdlbmUuMS4wLnN0LnYxX29sZA" "Y19wZC5yYWJnZW5lLjEuMC5zdF9vbGQ"   
[21] "Y19wZC5yYWV4LjEuMC5zdC52MV9vbGQ"    "Y19wZC5yZy51MzRiX29sZA"            
[23] "Y19wZC5yZy51MzRjX25ldw"            

$keys$objects
[1] "Y19wZC5hdGRzY2hpcC50aWxpbmdfbmV3"   "Y19wZC5odTY4MDBfbmV3"              
[3] "Y19wZC5taXJuYS4xLjBfbmV3"           "Y19wZC5taXJuYS4yLjBfbmV3"          
[5] "Y19wZC5tb2dlbmUuMi4wLnN0X29sZA"     "Y19wZC5tb2dlbmUuMS4wLnN0LnYxX25ldw"
[7] "Y19wZC5tb2dlbmUuMS4xLnN0LnYxX25ldw" "Y19wZC5yYWdlbmUuMS4wLnN0LnYxX29sZA"

$keys$progress
 [1] "Y19wZC5tdTExa3N1YmJfbmV3"           "Y19wZC5wb3BsYXJfb2xk"              
 [3] "Y19wZC5wb3JjaW5lX25ldw"             "Y19wZC5wb3JnZW5lLjEuMS5zdF9vbGQ"   
 [5] "Y19wZC5yaGVnZW5lLjEuMS5zdF9uZXc"    "Y19wZC5yY25nZW5lLjEuMS5zdF9uZXc"   
 [7] "Y19wZC5yYWdlbmUuMS4wLnN0LnYxX25ldw" "Y19wZC5yYWJnZW5lLjEuMC5zdF9vbGQ"   
 [9] "Y19wZC5yYWV4LjEuMC5zdC52MV9vbGQ"    "Y19wZC5yZy51MzRiX29sZA"            

Will get the rest of this put together in the next few workdays. Thanks again

wlandau commented 6 years ago

Fantastic, @richfitz! Also, I submitted the new issue you asked for in https://github.com/richfitz/storr/issues/55#issuecomment-367837436.

wlandau commented 6 years ago

FYI: from https://github.com/ropensci/drake/issues/198 and https://github.com/ropensci/drake/issues/322, I am starting to think we can clean out the extra files written by Dropbox and archiving tools. Just removing all the files containing spaces could solve many common use cases. cc @rkrug.

richfitz commented 6 years ago

Thanks for the reminder on this - I just need to finish the testing and this should be good to go. But yeah, given that with drake you are base64 encoding things you're good to wipe out any files that contain things outside of that alphabet

wlandau commented 6 years ago

Sure thing, Rich. I am considering an implementation for drake. However, archiving/uploading RDS storrs seems like a very common thing to do in the more general case. Would you be willing to consider a patch for storr itself? I do realize that things get tricky if we do not assume mangled keys.

wlandau commented 6 years ago

Update: drake_gc() now cleans out those disruptive backup files from drake's default caches (storr_rds() caches with mangled keys; re: https://github.com/ropensci/drake/issues/198).

richfitz commented 6 years ago

@wlandau - I'm preparing a CRAN release at the moment (will need to be submitted next week). Can you have a look at the i55_corruption branch (diff) and try it on some larger drake projects if you have some handy?

This is rather a lot of change and I'm a little concerned about rushing it out. On the other hand I know this is something that is going to be useful.

richfitz commented 6 years ago

I ran this on the drake cache that @krlmlr posted above and it is not very quick (1GB of RDS, 12.7k keys) and it takes 80s on the laptop to scan:

> system.time(err <- st$check())
Checking objects
...found 33 corrupt objects                                                   
Checking keys
...found 49 corrupt keys in 4 namespaces
...found 1 dangling keys in 1 namespaces
   user  system elapsed 
 73.559   7.120  81.662 

The repair is very quick

> st$repair(err)
Deleting 33 corrupt objects
Deleting 50 corrupt/dangling keys

Rescanning afterwards will take just as long.

There is a full = FALSE option too that is less thorough.

There is some support for the sort of dangling keys that @kendonB saw: presence of these files will no longer case $list to fail and instead these will be skipped over and a notice given that there unexpected files among the keys. There is an extra function st$driver$purge_corrupt_keys for purging these keys

wlandau commented 6 years ago

I just tested on a medium-sized drake cache from a real project. The output of $list() was the same as before, and $check() and $repair() do not generate false positives. But when I deliberately added a corrupt key and called $check() and $repair() repeatedly, some calls detected the corrupt key and some did not, and I am not sure what could have been different among different calls. Also, $driver$purge_corrupt_keys() removed the offending key and no others, but the namespace argument has no default. These issues are minor from my perspective. $list() appears to ignore corrupt keys, and drake::make() now runs just fine even in the presence of a corrupt key.

richfitz commented 6 years ago

But when I deliberately added a corrupt key and called $check() and $repair() repeatedly, some calls detected the corrupt key and some did not, and I am not sure what could have been different among different calls.

Hmm this is a worry. Can you elaborate on the keys you were adding? Ideally something like https://github.com/richfitz/storr/blob/3eaed050698e8b574fb81ff7e16aff278a40874f/tests/testthat/test-driver-rds.R#L322-L342 and I can follow up and try and get this sorted out.

Also, $driver$purge_corrupt_keys() removed the offending key and no others, but the namespace argument has no default.

This is intentional at the moment, it can get more aggressive as we find out if it always does the right thing! Eventually I will have a default of NULL which will scan over all namespaces.

These issues are minor from my perspective. $list() appears to ignore corrupt keys, and drake::make() now runs just fine even in the presence of a corrupt key.

Did the notice get displayed?

wlandau commented 6 years ago

$list(), $check(), and $repair(), do not always complain about corrupt keys. Reprex:

library(storr)
s <- storr_rds("s", mangle_key = TRUE)
s$set("key", 1)
path <- "s/keys/objects/a corrupt key (conflicted copy)"
file.create(path)
#> [1] TRUE
s$list()
#> 1 corrupted files have been found in your storr archive:
#> 
#> namespace: 'objects'
#> path: '/tmp/RtmppA49ks/s/keys/objects'
#> files:
#>   - 'a corrupt key (conflicted copy)'
#> 
#> See 'Corrupt keys' within ?storr_rds for how to proceed
#> [1] "key"
s$list()
#> [1] "key"
s$check()
#> Checking objects
#> ...OK
#> Checking keys
#> ...OK
#> $healthy
#> [1] TRUE
#> 
#> $objects
#> $objects$corrupt
#> character(0)
#> 
#> 
#> $keys
#> $keys$corrupt
#>      namespace key
#> 
#> $keys$dangling
#>      namespace key
#> 
#> 
#> attr(,"class")
#> [1] "storr_check"
s$check()
#> Checking objects
#> ...OK
#> Checking keys
#> ...OK
#> $healthy
#> [1] TRUE
#> 
#> $objects
#> $objects$corrupt
#> character(0)
#> 
#> 
#> $keys
#> $keys$corrupt
#>      namespace key
#> 
#> $keys$dangling
#>      namespace key
#> 
#> 
#> attr(,"class")
#> [1] "storr_check"
s$repair()
#> Checking objects
#> ...OK
#> Checking keys
#> ...OK
#> [1] FALSE
s$repair()
#> Checking objects
#> ...OK
#> Checking keys
#> ...OK
#> [1] FALSE
file.exists(path)
#> [1] TRUE
s$destroy()
sessionInfo()
#> R version 3.5.0 (2018-04-23)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Red Hat Enterprise Linux
#> 
#> Matrix products: default
#> BLAS: /opt/R/R-3.5.0/lib64/R/lib/libRblas.so
#> LAPACK: /opt/R/R-3.5.0/lib64/R/lib/libRlapack.so
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] storr_1.1.3
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_0.12.16      prettyunits_1.0.2 assertthat_0.2.0 
#>  [4] digest_0.6.15     rprojroot_1.3-2   R6_2.2.2         
#>  [7] backports_1.1.2   formatR_1.5       magrittr_1.5     
#> [10] evaluate_0.10.1   progress_1.1.2    stringi_1.2.2    
#> [13] rmarkdown_1.9     tools_3.5.0       stringr_1.3.0    
#> [16] yaml_2.1.19       compiler_3.5.0    htmltools_0.3.6  
#> [19] knitr_1.20
richfitz commented 6 years ago

Thanks! OK, what if I told you this was intentional :grinning:

The issue is this: for most of the really insidious damage we want to know what keys and objects need to be removed. This is poorly written rds objects, truncated keys etc. This can all be done by passing around a set of names of keys, etc and the actual repair can be done by storr independent of the underlying driver.

The issue is with the mangle key option there is no way of referencing the keys with form " " because we can't create a key that the rds driver can refer to. So the cleanup there needs to be done by the driver and can't be pulled centrally into the storr.

So when the driver sees this, it prints that message. It can't be fixed by repair.

#> 1 corrupted files have been found in your storr archive:
#> 
#> namespace: 'objects'
#> path: '/tmp/RtmppA49ks/s/keys/objects'
#> files:
#>   - 'a corrupt key (conflicted copy)'
#> 
#> See 'Corrupt keys' within ?storr_rds for how to proceed

The reason why you're only seeing the message sometimes is explained in the help linked from that output - the message is only printed at most once per minute per namespace (to stop it getting really annoying).

Not sure where this leaves things. Do you have further thoughts on improving this given that I would like a general repair solution to work and not depend on the details of how drivers mangle keys

wlandau commented 6 years ago

$driver$purge_corrupt_keys() has what I need, and drake::make() works in spite of corrupt keys, so any intended behavior beyond that is fine with me.

richfitz commented 6 years ago

Awesome! Thanks to everyone in this thread for helping nailing this down. I'm sure there's more to refine here going forward though