ihmeuw-demographics / hierarchyUtils

Demographics Related Utility Functions
https://ihmeuw-demographics.github.io/hierarchyUtils/
BSD 3-Clause "New" or "Revised" License
8 stars 3 forks source link

FEATURE: add `na_rm` option for `agg` function #49

Closed krpaulson closed 3 years ago

krpaulson commented 3 years ago

Is your feature request related to a problem? Please describe. It would be useful to add a na_rm option to aggregation.

Here's an example where I would like non-NA sums in the aggregated output:

> dt <- data.table(
+   age_start = c(0,1,0,1),
+   age_end = c(1,2,1,2),
+   group = c(1,1,2,2),
+   val = c(NA,1,2,3)
+ )
> 
> dt <- hierarchyUtils::agg(
+   dt,
+   id_cols = c("age_start", "age_end", "group"),
+   value_cols = "val",
+   col_stem = "age",
+   col_type = "interval",
+   mapping = data.table(age_start = 0, age_end = 2),
+   missing_dt_severity = "none"
+ )
> dt
   age_start age_end group val
1:         0       2     1  NA
2:         0       2     2   5

Describe the solution you'd like Add a na.rm = T option to agg function.

Describe alternatives you've considered Make missing_dt_severity flexible enough to accommodate this option. Perhaps add "remove" as a valid entry for missing_dt_severity?

Additional context A possible work-around is for users to remove or interpolate NAs prior to aggregation. Especially since it isn't great form to be ignoring NAs in interval aggregation like this. But right now the "missing_dt_severity = 'none'" option is misleading... my quick interpretation of it was that it would sum up non-NA values.

chacalle commented 3 years ago

This should have been closed with #54 when the na_value_severity argument was added.

#' **`na_value_severity`**:
#'
#' Check for 'NA' values in the `value_cols`.
#' 1. `stop`: throw error (this is the default).
#' 2. `warning` or `message`: throw warning/message, drop missing values and
#' continue with aggregation/scaling where possible (this likely will cause
#' another error because of `missing_dt_severity`, consider setting
#' `missing_dt_severity = "skip"` for functionality similiar to `na.rm = TRUE`).
#' 3. `none`: don't throw error or warning, drop missing values and continue
#' with aggregation/scaling where possible (this likely will cause another error
#' because of `missing_dt_severity`, consider setting
#' `missing_dt_severity = "skip"` for functionality similiar to `na.rm = TRUE`).
#' 4. `skip`: skip this check and propagate `NA` values through
#' aggregation/scaling.