These changes pull together a couple PRs to speed up agg/scale especially for large datasets. I think there is still some room to do better profiling on scale but will save that for a future PR.
The new arguments added to agg are present_agg_severity, overlapping_dt_severity, na_value_severity and removes the drop_present_aggs argument.
The new arguments added to scale are overlapping_dt_severity and na_value_severity.
[X] Have you successfully run devtools::check() locally?
[X] Have you updated or added function (and vignette if applicable) documentation? Did you update the 'man' and 'NAMESPACE' files with devtools::document()?
[X] Have you added in tests for the changes included in the PR?
[X] Do the changes follow the ihmeuw-demographicscode style?
[ ] Do the changes need to be immediately included in a new build of docker-base or docker-internal? If so follow directions in those repositories to rebuild and redeploy the images.
[ ] Do the changes require updates to other repositories which use this package? If yes, make the necessary updates in those repos, and consider integration tests for those repositories.
Details of PR
Specifically the checks to identify overlapping or missing interval variable were very slow because they checked every combination of id_cols. I either removed some of these checks or added a skip option to some of the checks to skip when we know a dataset is square and doesn't need to be checked.
I'd be curious to know what your process was for testing the speed on this as you made changes. Perhaps you could document best practices for speed tests in our package template wiki?
This PR is fairly large, so I focused on the documentation like you suggested and skimmed through the rest. I understand why this was one PR but I wonder if we should aim to have smaller PRs in the future? It makes it kind of daunting to review. Given this, though, I did appreciate the context you provided in the PR description.
Describe changes
These changes pull together a couple PRs to speed up agg/scale especially for large datasets. I think there is still some room to do better profiling on
scale
but will save that for a future PR.The new arguments added to
agg
arepresent_agg_severity
,overlapping_dt_severity
,na_value_severity
and removes thedrop_present_aggs
argument.The new arguments added to
scale
areoverlapping_dt_severity
andna_value_severity
.Checklist
Packages Repositories
ihmeuw-demographics
R packages?devtools::check()
locally?devtools::document()
?ihmeuw-demographics
code style?docker-base
ordocker-internal
? If so follow directions in those repositories to rebuild and redeploy the images.Details of PR
Specifically the checks to identify overlapping or missing interval variable were very slow because they checked every combination of id_cols. I either removed some of these checks or added a
skip
option to some of the checks to skip when we know a dataset is square and doesn't need to be checked.