Open wlandau opened 7 years ago
This is fixed on branch develop
(though see the notes in the two commits above). Would you mind testing this and see if this works for your use case? In particular, please check that things work as expected on existing storrs; I've tried hard to make sure that nothing here breaks backward compatibility
I checked a couple important use cases, and they work perfectly. For each of remakeGenerator::example_remakeGenerator("basic")
and drake::example_drake("basic")
, I ran the project with the current CRAN storr
(1.0.1) and verified that all targets remained up to date after updating to the latest develop
commit (7761b2e48b803767f926e61a3bf521ffd8b0e4c6). Then I started both examples from scratch using the develop
storr
and saw that they ran smoothly. Finally, all the devtools
checks and unit tests for remakeGenerator, parallelRemake, and drake passed with no warnings, errors, or notes under the develop
storr
(Mac OS, R 3.3.2).
For my other use case, I was originally using storr::encode64()
to solve parallelRemake
issue 14 and drake
issue 19, but I have since decided to use base64_urlencode()
from base64url instead. (I noticed that base64url
uses C code and focuses on speed. Do you think storr
might see some performance gains if you swap out encode64()
and decode64()
?)
I also compared the some of the output keys for both versions of storr
. Padded filenames from the CRAN version were not padded in the develop
version, so I will close this issue.
Thanks!
Thanks for the extensive testing - much appreciated!
I noticed that base64url uses C code and focuses on speed. Do you think storr might see some performance gains if you swap out encode64() and decode64()?
I had thought that the speed difference here would not be too bad, but it will be worth thinking about.
I can't immediately base64_urlencode
because it can't pad strings which would break existing storrs. But in the future I can move to use that package (though in general I don't want to increase the dependencies any more than absolutely necessary because I it complicates deployment; depending on base64url
would drag in only two more packages and they're small so that's not a huge deal.
It looks like for trivial writes, the encoding is a bit of a timesink
st <- storr::storr_rds(tempfile(), mangle_key = TRUE)
x <- paste(letters, collapse = "")
Rprof(interval = 1/200)
for (i in 1:5000) {
st$set(x, 1, use_cache = FALSE)
}
Rprof(NULL)
summaryRprof()
which gives (relevant lines only)
$by.total
total.time total.pct self.time self.pct
"st$set" 1.295 99.23 0.025 1.92
"<Anonymous>" 0.935 71.65 0.010 0.77
"writeLines" 0.645 49.43 0.025 1.92
"self$name_key" 0.455 34.87 0.010 0.77
"encode64" 0.385 29.50 0.160 12.26
"self$set_value" 0.380 29.12 0.025 1.92
And then comparing all the bits:
st0 <- storr::storr_rds(tempfile(), mangle_key = FALSE)
microbenchmark::microbenchmark(
base64url::base64_urlencode(x),
storr::encode64(x, pad = FALSE),
st$set(x, 1, use_cache = FALSE),
st0$set(x, 1, use_cache = FALSE))
which gives
Unit: microseconds
expr min lq mean median uq
base64url::base64_urlencode(x) 4.506 7.7395 10.43955 10.2810 11.2580
storr::encode64(x, pad = FALSE) 33.024 39.3780 49.30678 49.6800 55.3660
st$set(x, 1, use_cache = FALSE) 202.651 224.3310 289.50977 231.9120 242.2800
st0$set(x, 1, use_cache = FALSE) 142.946 154.3020 166.26774 159.4535 169.6705
max neval
29.228 100
87.909 100
4422.203 100
358.444 100
So there's a nontrivial time being taken there that could be removed. I'll continue further thoughts on this in #44
I am happy to test where I can, and I appreciate the profiling and interest.
I posted some more recent benchmarks at https://github.com/richfitz/storr/issues/44#issuecomment-435636219. In my experience using storr
through drake
, the bottleneck due to key mangling seems quite severe for storr
s with lots of tiny objects.
Would you be open to a PR with base64url
+ manual key padding?
The base64url package forgoes padding in
base64_urlencode()
because"="
is sometimes unsafe in urls/files. This seems like an extremely small concern, but I am wondering if there is a weird edge case where we will need saferstorr
key filenames.For parallelRemake and drake, I cannot pad because
"="
messes withMakefile
rules (drake issue 19).