Closed chacalle closed 3 years ago
Actually messed up the initial timings and was using an old installed version of hierarchyUtils before we noticed the slow down so that was why the initial set of times I posted is so slow.
Original slow timings:
> agg_timings
col_stem col_type n_draws method n_input_rows user.self sys.self elapsed
1: age interval 1 data.table 13,632 0.057 0.006 0.065
2: age interval 1 hierarchyUtils 13,632 9.307 0.236 9.665
3: age interval 10 data.table 136,320 0.131 0.024 0.157
4: age interval 10 hierarchyUtils 136,320 46.943 0.630 47.835
5: sex categorical 1 data.table 13,632 0.008 0.000 0.008
6: sex categorical 1 hierarchyUtils 13,632 0.568 0.009 0.581
7: sex categorical 10 data.table 136,320 0.055 0.008 0.063
8: sex categorical 10 hierarchyUtils 136,320 4.786 0.090 4.908
Here is a pdf version of the vignette with updated timings. Aggregation_Scaling performance.pdf And pasting the table version from running interactively
> agg_timings
col_stem col_type n_draws method n_input_rows user.self sys.self elapsed
1: age interval 1 data.table 13,632 0.067 0.005 0.071
2: age interval 1 hierarchyUtils 13,632 3.828 0.211 4.145
3: age interval 10 data.table 136,320 0.142 0.020 0.164
4: age interval 10 hierarchyUtils 136,320 3.729 0.257 4.001
5: age interval 100 data.table 1,363,200 0.834 0.230 1.076
6: age interval 100 hierarchyUtils 1,363,200 13.747 1.800 15.708
7: age interval 1000 data.table 13,632,000 7.102 2.639 9.843
8: age interval 1000 hierarchyUtils 13,632,000 104.364 20.984 127.034
9: sex categorical 1 data.table 13,632 0.008 0.001 0.009
10: sex categorical 1 hierarchyUtils 13,632 0.098 0.006 0.104
11: sex categorical 10 data.table 136,320 0.060 0.009 0.070
12: sex categorical 10 hierarchyUtils 136,320 0.818 0.052 0.874
13: sex categorical 100 data.table 1,363,200 0.583 0.065 0.655
14: sex categorical 100 hierarchyUtils 1,363,200 7.335 0.463 7.995
15: sex categorical 1000 data.table 13,632,000 5.359 0.809 6.240
16: sex categorical 1000 hierarchyUtils 13,632,000 68.220 5.740 74.971
I referenced this SO question in trying to understand it. What we care about is the user time I think, this is the amount of time we wait around for. When the elapsed time is less than the user time then that means the command used multiple cores to speed up.
Describe changes
This adds a small vignette comparing the speed of
hierarchyUtils::agg
to basic data.table code. As described in the vignette for basic use cases there should only be slightly more overhead due to the assertions and added flexibility included inhierarchyUtils
.This indicates there is still something slowing hierarchyUtils down that needs to be diagnosed.
What issues are related
Related to #47
Checklist
Packages Repositories
ihmeuw-demographics
R packages?devtools::check()
locally?devtools::document()
?ihmeuw-demographics
code style?docker-base
ordocker-internal
? If so follow directions in those repositories to rebuild and redeploy the images.Details of PR
Example table of timings