richfitz / storr

:package: Object cacher for R
http://richfitz.github.io/storr
Other
116 stars 10 forks source link

Don't use in-place file writes #42

Closed krlmlr closed 7 years ago

krlmlr commented 7 years ago

Use case: I thought I could back up a ~30 GB storr from a remake project, to be able to restore it later (https://github.com/richfitz/remake/issues/136). Used cp -lr .remake .remake.bak which creates hardlinks to files recursively. After trying out something and then restoring with rm -r .remake; mv .remake .remake.bak I'm now stuck with an inconsistent storr...

richfitz commented 7 years ago

This sounds really bad, but I'm a bit confused about exactly happened here. Was the problem that you used the storr while it was copying? Or am I missing something?

I'm working on a leveldb storr driver at the moment that should be more robust (bit not allow any concurrency)

richfitz commented 7 years ago

Hi @krlmlr; I am about to prepare a CRAN update for this package shortly. Could you possibly elaborate on how to reproduce and/or eliminate this problem as I don't understand it yet.

krlmlr commented 7 years ago

Assuming a remake project :

  1. Run the project
  2. Back up the .remake directory with cp -lr .remake .remake.bak
    • creates hard links only, doesn't duplicate the files
  3. Modify and run the project
  4. Rollback the project, reinstall the backup with rm -r .remake; mv .remake.bak .remake
  5. Run again

Expected result: The last run doesn't need to remake anything.

Actual result (observed once but not yet reproduced): Apparently inconsistent storr.

I suspect in-place writes as the cause of the problem, because these have an effect on both the .remake directory and the backup. Instead of overwriting files, just create them with a different file name (in the same directory) and rename after completion. This is also safer in case of unexpected errors, the file update is closer to being atomic.

richfitz commented 7 years ago

Thanks for this; I'll try and reproduce. I think that in-place writes are a good idea in any case and they are atomic when on the same file system on POSIX (and close enough on windows from the look of it)

krlmlr commented 7 years ago

I'm not sure about the terminology; it looks like storr might be just opening a file for writing, I refer to this as "in-place write". I'm suggesting to write to a temporary file first (just like Git does), what's the name for this technique?

richfitz commented 7 years ago

write-and-move? I've done this elsewhere on other projects to work around limitations in download.file. I'd never thought much about how failure in write will leave a storr though.

richfitz commented 7 years ago

I can't replicate this on OSX (cp doesn't have the -l option and I can't get rsync to do it).

But on linux:

  path1 <- tempfile()
  path2 <- tempfile()
  st1 <- storr_rds(path1)
  st1$set("foo", "val1")

  ok <- system2("cp", c("-lr", path1, path2))

  st2 <- storr_rds(path2)
  expect_equal(st2$get("foo"), "val1")

  st1$set("foo", "val2")
  expect_equal(st1$get("foo"), "val2")
  expect_equal(st2$get("foo"), "val1") # error instead