richfitz / storr

:package: Object cacher for R
http://richfitz.github.io/storr
Other
116 stars 10 forks source link

Race condition when processes write the same data to different keys #80

Closed wlandau closed 6 years ago

wlandau commented 6 years ago

One of the things I love about storr, especially for drake, is the apparent thread safety of the RDS driver. As I understand it, different processes should be able to write to different keys or namespaces in parallel, and I heavily rely on this capability. Unfortunately, there is a race condition when the data objects are the same across processes/keys.

library(parallel)
library(storr)

for (i in 1:10) {
  cache <- storr_rds(tempfile())
  mclapply(letters, function(key, path) {
    tryCatch({
      cache <- storr::storr_rds(path)
      for (x in c(letters, LETTERS)) {
        cache$set(key = key, value = x)
      }
    }, warning = function(w) {
      saveRDS(w, "w.rds")
    })
  }, path = cache$driver$path, mc.cores = 8)
  cache$destroy()
}

readRDS("w.rds")
#> <simpleWarning in file.rename(tmp, filename): cannot rename file '/tmp/Rtmp5AG6uj/file6e1f64b53c9/scratch/9ec5a5e3b79a8d3273f0054c5b8a7380.rds' to '/tmp/Rtmp5AG6uj/file6e1f64b53c9/data/9ec5a5e3b79a8d3273f0054c5b8a7380.rds', reason 'No such file or directory'>

This race condition seems to be responsible for most of the strange behavior in https://github.com/ropensci/drake/issues/404. I am sorry I did not discover it before storr 1.2.0 was released to CRAN.

richfitz commented 6 years ago

Hi @wlandau-lilly

Sorry about this, and thanks for the great reprex. The scratch changes were working around another, more insidious race condition (if tempdir() and the storr are on different filesystems - which is the case on many systems then write-copy pattern is nonatomic so two processes can copy to the same file with unpleasant results). I had a shot at fixing is on the rds_separate_scratch branch but that doesn't fix your case. I have another idea I'll try and implement tonight

richfitz commented 6 years ago

Can you please test: https://github.com/richfitz/storr/tree/rds_separate_scratch2

wlandau commented 6 years ago

It was really hard to reproduce this race condition, so I am glad you like the example. Thanks for looking into it further.

I am inundated with several other things at the moment, and I will get to rds_separate_scratch2 as soon as I can.

I hope you do not have to resort to file locking, but from recent experience, I do not think it is so bad. I use filelock now for txtq, which in turn manages the interprocess communication for drake.

richfitz commented 6 years ago

No file locking necessary, thankfully

wlandau commented 6 years ago

The race condition appears free from the reprex when I run it with rds_separate_scratch2, and I am applying it to drake's long HPC tests right now. I will let you know what happens.

richfitz commented 6 years ago

Great, thanks!

wlandau commented 6 years ago

Until rds_separate_scratch2, drake's long-running non-CRAN HPC tests had occasional intermittent "cannot open connection" failures on Windows. Now, those same tests are clean. Thanks so much for the patch! Are you ready to merge it into master? I'm not holding my breath, but I'm eager for the next release.

richfitz commented 6 years ago

Thanks for testing Will - I will merge this into master but I don't think I can do a CRAN release for another ~month at least lest I raise the ire of the cran-keepers. If this is a major problem please let me know and I'll take the risk

wlandau commented 6 years ago

Sounds great. I do not think the problem is urgent enough to violate the official CRAN submission frequency guidelines, I will just refer people to development storr if they raise the issue.

richfitz commented 6 years ago

Great, thanks! It's merged into master now and the version number for development has been bumped to 1.2.1 (only 1.2.0 should be affected by this behaviour I believe).

wlandau commented 6 years ago

Do you have plans to release version 1.2.1 to CRAN as is, or are there other features you plan to work on first? I realize that the patch for this issue is small.

richfitz commented 6 years ago

Thanks for the reminder Will

richfitz commented 6 years ago

Sent to CRAN (sorry for the delay). Fingers crossed

richfitz commented 6 years ago

And on CRAN :tada:

wlandau commented 6 years ago

Thanks, Rich!