ihmeuw-demographics / hierarchyUtils

Demographics Related Utility Functions
https://ihmeuw-demographics.github.io/hierarchyUtils/
BSD 3-Clause "New" or "Revised" License
8 stars 3 forks source link

BUG: issue with overlapping age groups in agg function #66

Open hcomfo95 opened 3 years ago

hcomfo95 commented 3 years ago

Describe the bug The agg function produces this error when the overlapping age group (age start =0, age end = 11) is included in the dataset.

Error in subtrees[[i]]: subscript out of bounds

It would be nice if the error was more informative. I resolved the error by dropping the age group since it is not needed, but it would be nice if the function did that automatically.

To Reproduce

dt <- data.table(age_group_id = c(7:16, 322),
                 nid = rep(234279),
                 underlying_nid = rep(NA),
                 ihme_loc_id = rep("BRA"),
                 year_id = rep(2000),
                 sex = rep("both"),
                 births_reported = c(28973, 721564, 998526, 720342, 443512, 214808, 55665, 4690, 93, 20, 0),
                 age_start = c(seq(10, 55, 5), 0),
                 age_end = c(seq(15, 60, 5), 11),
                 unique_identifier = rep("234279_NA_BRA_2000_both"))

age_specific_births_reported <- copy(dt)
gbd_year <- 2020

age_map <- mortdb::get_age_map(gbd_year = gbd_year, type = "all")

age_map_10_54 <- age_map[age_group_id == 169, c("age_group_years_start", "age_group_years_end")]
colnames(age_map_10_54) <- c("age_start", "age_end")

value_cols <- "births_reported"
id_cols <- names(age_specific_births_reported)[!names(age_specific_births_reported) %in% value_cols]

age_specific_agg_age_10_54 <- data.table()

for (i in unique(age_specific_births_reported$unique_identifier)) {

  temp <- age_specific_births_reported[unique_identifier == i, ]

  temp_agg <- hierarchyUtils::agg(
    dt = temp,
    id_cols = id_cols,
    value_cols = value_cols,
    col_stem = "age",
    col_type = "interval",
    mapping = age_map_10_54,
    missing_dt_severity = "none",
    present_agg_severity = "skip",
    overlapping_dt_severity = "stop"
  )

  age_specific_agg_age_10_54 <- rbind(age_specific_agg_age_10_54, temp_agg)

  temp_agg <- NULL

}
chacalle commented 3 years ago

So the problem here was that age_group_id was included in the input data set. If that is removed the overlapping intervals are correctly found

dt <- data.table(age_group_id = c(7:16, 322),
                 nid = rep(234279),
                 underlying_nid = rep(NA),
                 ihme_loc_id = rep("BRA"),
                 year_id = rep(2000),
                 sex = rep("both"),
                 births_reported = c(28973, 721564, 998526, 720342, 443512, 214808, 55665, 4690, 93, 20, 0),
                 age_start = c(seq(10, 55, 5), 0),
                 age_end = c(seq(15, 60, 5), 11),
                 unique_identifier = rep("234279_NA_BRA_2000_both"))

age_specific_births_reported <- copy(dt)

gbd_year <- 2020

age_map <- demInternal::get_age_map(gbd_year = gbd_year, type = "all")

age_map_10_54 <- age_map[age_group_id == 169, c("age_start", "age_end")]

age_specific_agg_age_10_54 <- data.table()

i <- "234279_NA_BRA_2000_both"
temp <- age_specific_births_reported[unique_identifier == i, ]
temp[, age_group_id := NULL]
value_cols <- "births_reported"
id_cols <- names(temp)[!names(temp) %in% value_cols]

temp_agg <- hierarchyUtils::agg(
  dt = temp,
  id_cols = id_cols,
  value_cols = value_cols,
  col_stem = "age",
  col_type = "interval",
  mapping = age_map_10_54,
  missing_dt_severity = "none",
  present_agg_severity = "skip",
  overlapping_dt_severity = "stop"
)
Aggregating age
Collapsing age to the most detailed common set of intervals
 Error in hierarchyUtils::agg(dt = temp, id_cols = id_cols, value_cols = value_cols,  : 
  empty_dt : Some overlapping intervals were identified in `dt`.
These will be automatically dropped.
      nid underlying_nid ihme_loc_id year_id  sex       unique_identifier age_start age_end
1: 234279             NA         BRA    2000 both 234279_NA_BRA_2000_both         0      11
2: 234279             NA         BRA    2000 both 234279_NA_BRA_2000_both        10      15
                           issue
1: overlapping intervals present
2: overlapping intervals present
[1] FALSE 
chacalle commented 3 years ago

It gets caught right here.

I'm not sure of a good generalized way to catch that there is an issue with dt and id_cols