ihmeuw-demographics / hierarchyUtils

Demographics Related Utility Functions
https://ihmeuw-demographics.github.io/hierarchyUtils/
BSD 3-Clause "New" or "Revised" License
8 stars 3 forks source link

Simplify and speed up agg/scale functions #47

Closed chacalle closed 3 years ago

chacalle commented 4 years ago

Describe changes

These changes pull together a couple PRs to speed up agg/scale especially for large datasets. I think there is still some room to do better profiling on scale but will save that for a future PR.

The new arguments added to agg are present_agg_severity, overlapping_dt_severity, na_value_severity and removes the drop_present_aggs argument.

The new arguments added to scale are overlapping_dt_severity and na_value_severity.

Checklist

Packages Repositories

Details of PR

Specifically the checks to identify overlapping or missing interval variable were very slow because they checked every combination of id_cols. I either removed some of these checks or added a skip option to some of the checks to skip when we know a dataset is square and doesn't need to be checked.

krpaulson commented 4 years ago

A few general comments:

krpaulson commented 3 years ago

@chacalle What's holding up this PR at this point? I'm trying to use agg in some code so curious about if we can merge this PR soon.