ihmeuw-demographics / hierarchyUtils

Demographics Related Utility Functions
https://ihmeuw-demographics.github.io/hierarchyUtils/
BSD 3-Clause "New" or "Revised" License
8 stars 3 forks source link

fix aggregation when nothing possible #67 #72

Closed chacalle closed 3 years ago

chacalle commented 3 years ago

Describe changes

Fixes #67, flagged by @hcomfo95. Where the requested aggregate was age 10-55 but data only had 15-55. Called with missing_dt_severity = "none". Since the requested aggregate is not possible, the function should have returned an empty data.table but was instead erroring out with an obscure error.

none: don't throw error or warning, continue with aggregation/scaling for requested aggregations/scalings where expected input data in dt is available.

skip: skip this check and continue with aggregation/scaling.

If instead @hcomfo95 expected to still make the aggregate even though it was missing input age group values I fixed missing_dt_severity = "skip" so that it still goes ahead and makes the aggregate.

@erinamay I were talking about a similar situation this week where there are implied zeroes or unknown values in the input dataset. Like in this example 10-15 is unknown or implied zero. We likely need helper functions to fill in implied zeroes and then a better way to handle unknown id columns (like age) or value columns. Related to #49 and #65

Checklist

Packages Repositories

erinamay commented 3 years ago

I agree with Haley it could still be confusing to get an empty table returned. If there was some kind of message saying missing x ages that would be super helpful!

chacalle commented 3 years ago

If missing_dt_severity is set to stop, warning, or message it does print out the missing age groups. Is this just a case of generally not recommending none since it does the same thing as warning or message, but hides potentially important information?

stop: throw error (this is the default).

warning or message: throw warning/message and continue with aggregation/scaling for requested aggregations/scalings where expected input data in dt is available.

none: don't throw error or warning, continue with aggregation/scaling for requested aggregations/scalings where expected input data in dt is available.

skip: skip this check and continue with aggregation/scaling.
chacalle commented 3 years ago

We could also consider saving problematic rows (overlapping intervals, missing intervals) as different attributes of the returned object so that someone could inspect more.

Also could encourage more checks using helper functions prior to calling agg. Especially if ignoring checks within agg.