richfitz / storr

:package: Object cacher for R
http://richfitz.github.io/storr
Other
116 stars 10 forks source link

Optionally avoid scratch #117

Open wlandau opened 4 years ago

wlandau commented 4 years ago

As mentioned in #80, some use cases of RDS storrs require atomic writes, which depend on the scratch directory. However, writing to scratch and then moving the file creates a bottleneck on some systems, Windows in particular. This workflow spends a lot of time renaming tiny files, and the total runtime was around 104 seconds on my machine.

before-104s

The changes in this PR cut the total runtime down to about 50 seconds, and file.rename() is no longer a bottleneck.

after-50s

I need to do more digging to make sure people can disable scratch with drake, but since progress logging is different than it once was, I think it is worth a shot.

> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] withr_2.1.2          tibble_2.1.3         storr_1.2.2          microbenchmark_1.4-7
[5] MakefileR_1.0        profile_1.0          fs_1.3.1             drake_7.7.0.9002    

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.3      txtq_0.2.0      crayon_1.3.4    digest_0.6.23   R6_2.4.1       
 [6] backports_1.1.5 magrittr_1.5    pillar_1.4.2    rlang_0.4.2     rstudioapi_0.10
[11] filelock_1.0.2  tools_3.6.1     igraph_1.2.4.2  yaml_2.2.0      compiler_3.6.1 
[16] pkgconfig_2.0.3 base64url_1.4  
wlandau commented 4 years ago

Hmm... the segfault on Travis seems to trace back to has_postgres(). @richfitz, what do you suggest we do?

wlandau commented 4 years ago

Also, I just remembered that even if the keys are different, the data might still be the same, which creates a race condition. I ran right into that Chesterton fence :sweat_smile:. But we might still be able to save time by skipping scratch for the keys in many use cases, including drake.