Closed thomased closed 1 year ago
Current state
Slower when computing full suite of variables (as expected, owing to duplication of calcs)
library(microbenchmark)
data(sicalis)
microbenchmark(
summary(sicalis),
summary2.rspec(sicalis)
)
Unit: milliseconds
expr min lq mean median uq max neval cld
summary(sicalis) 8.952917 9.063292 9.882924 9.18223 9.353521 17.08933 100 a
summary2.rspec(sicalis) 30.926543 31.235022 34.230939 31.66886 37.546584 117.12217 100 b
>
Faster when computing subset (as expected, since that's all it's calculating)
microbenchmark(
summary(sicalis, subset = 'S2'),
summary2.rspec(sicalis, subset = 'S2')
)
Unit: microseconds
expr min lq mean median uq max neval cld
summary(sicalis, subset = "S2") 8895.251 9075.2505 9830.7253 9189.292 9503.1675 16579.334 100 a
summary2.rspec(sicalis, subset = "S2") 591.917 628.5425 659.8965 652.376 678.9175 811.125 100 b
Some further refactoring can probably bring down that difference for the full set
Alright, I think I've gone as far as I can & have the patience for at the moment. So it now only calculates the specified variable(s) (and any dependent variables), which I think is good news just from a design perspective. E.g. it shouldn't matter if you don't have the wavelength range for 'green chroma' if you just want B3.
It also means some changes to its speed. It's now slower when calculating all variables — about ~half the speed:
> microbenchmark(
+ summary(sicalis),
+ summary_old.rspec(sicalis)
+ )
Unit: milliseconds
expr min lq mean median uq max neval cld
summary(sicalis) 20.536917 20.844168 24.35726 21.463855 23.064938 149.1463 100 a
summary_old.rspec(sicalis) 8.915709 9.040937 10.95369 9.219043 9.444459 120.3357 100 b
But ~10x faster when calculating only a subset:
> microbenchmark(
+ summary(sicalis, subset = 'S2'),
+ summary_old.rspec(sicalis, subset = 'S2')
+ )
Unit: microseconds
expr min lq mean median uq max neval cld
summary(sicalis, subset = "S2") 768.876 807.7095 848.439 838.730 866.2295 1122.376 100 a
summary_old.rspec(sicalis, subset = "S2") 8912.167 9070.2920 9940.778 9255.084 9542.8550 18332.750 100 b
I didn't think it'd be quite so slow for the full-variable-set use case, but I might be missing something obvious. I'm sure there's more tweaking & tightening which can be done - feel free to shout if anything obvious leaps out at you.
Absolutely — feel free to push/tweak anything. Thanks!
I had another, in-depth, look and I'm now not sure more comments are needed. I've pushed a minor update but I think it already looks very good as it is :+1:.
One problematic edge case I've found while trying to add deprecation tests:
summary(sicalis, wlmin = 400)
Error in if (lambdamax < lim[2]) { : missing value where TRUE/FALSE needed In addition: Warning message: In summary.rspec(sicalis, wlmin = 400) : The 'wlmin' and 'wlmax' arguments are deprecated as of v2.9.0, and will be removed in future releases. Use the lim argument instead.
Thanks! Should be fixed with that next commit.
I'll merge this in shortly then, and start prepping a release soon too.
summary.rspec rejig (fixes #165)
wlmin
,wlmax
, in favour oflim = c()
argument.subset
.