hubverse-org / hubEnsembles

Ensemble methods for combining hub model outputs.
https://hubverse-org.github.io/hubEnsembles/
Other
5 stars 2 forks source link

Error when calling `linear_pool` for `model_outputs` that have `quantile` and `pmf` output types #37

Closed eahowerton closed 9 months ago

eahowerton commented 9 months ago

Consider a model_output data.frame that has quantile and pmf output types. In this example, the pmf output type is structured after FluSight, so output_type_id = c("large_decrease", "decrease", "stable", "increase", "large_increase"). This means theoutput_type_idcolumn is typechr`.

> head(model_outputs)
# A tibble: 6 × 8
  origin_date horizon location target         output_type output_type_id value model_id
  <date>        <int> <chr>    <chr>          <chr>       <chr>          <dbl> <chr>   
1 2022-12-05       -6 20       inc covid hosp quantile    0.01              22 UMass-ar
2 2022-12-05       -6 20       inc covid hosp quantile    0.025             24 UMass-ar
3 2022-12-05       -6 20       inc covid hosp quantile    0.05              26 UMass-ar
4 2022-12-05       -6 20       inc covid hosp quantile    0.1               28 UMass-ar
5 2022-12-05       -6 20       inc covid hosp quantile    0.15              30 UMass-ar
6 2022-12-05       -6 20       inc covid hosp quantile    0.2               32 UMass-ar

> tail(model_outputs)
# A tibble: 6 × 8
  origin_date horizon location target         output_type output_type_id  value model_id           
  <date>        <int> <chr>    <chr>          <chr>       <chr>           <dbl> <chr>              
1 2022-12-19       14 25       inc covid hosp pmf         large_increase 0.301  simple_hub-baseline
2 2022-12-19       14 US       inc covid hosp pmf         large_decrease 0.133  simple_hub-baseline
3 2022-12-19       14 US       inc covid hosp pmf         decrease       0.361  simple_hub-baseline
4 2022-12-19       14 US       inc covid hosp pmf         stable         0.316  simple_hub-baseline
5 2022-12-19       14 US       inc covid hosp pmf         increase       0.0913 simple_hub-baseline
6 2022-12-19       14 US       inc covid hosp pmf         large_increase 0.0989 simple_hub-baseline

Thus, when linear_pool calls distfromq::make_q_fun for the quantile output type, an error is returned because the quantile values (from output_type_id column) are characters.

> linear_pool_ens <- hubEnsembles::linear_pool(model_outputs %>%
+                                                filter(output_type != "median"), 
+                                              model_id = "hub-ensemble-linear-pool")
Error in `map()`:
ℹ In index: 3.
Caused by error in `dplyr::summarize()` at hubEnsembles/R/linear_pool_quantile.R:66:2:
ℹ In argument: `pred_qs = list(...)`.
ℹ In group 1: `model_id = "UMass-ar"`, `origin_date = 2022-12-05`, `horizon = -6`, `location = "20"`, `target = "inc covid
  hosp"`.
Caused by error in `qdst()`:
! Non-numeric argument to mathematical function
Backtrace:
  1. hubEnsembles::linear_pool(...)
 31. distfromq::make_q_fn(ps = output_type_id, qs = value, ...)
 33. distfromq:::spline_cdf(...)
 34. distfromq:::spline_cdf_grid_interp(...)
 44. distfromq:::grid_augment_ps_qs(ps, qs, tail_dist, n_grid)
 45. distfromq:::spline_cdf_direct(...)
 46. distfromq:::d_ext_factory(...)
 56. distfromq:::calc_loc_scale_params(ps, qs, dist)
 59. stats (local) qdst(ps[2])

This can be solved if the user forces the type of the output_type_id column to be numeric, but this solution is not ideal in my opinion.

> linear_pool_ens <- hubEnsembles::linear_pool(model_outputs %>%
+                                                  filter(output_type != "median") %>%
+                                                  mutate(output_type_id = as.numeric(output_type_id)), 
+                                              model_id = "hub-ensemble-linear-pool")
Warning message:
There was 1 warning in `mutate()`.
ℹ In argument: `output_type_id = as.numeric(output_type_id)`.
Caused by warning:
! NAs introduced by coercion 

Perhaps, we want to add a step that checks the type of output_type_id before calling distfromq? What do you think @elray1 and @lshandross?

elray1 commented 9 months ago

Hi @eahowerton -- can you double check that you have the latest version of the package installed? I was thinking we had already addressed this problem as part of issue #32 -- just want to double check that it's still a problem. Definitely agree that we should handle this here.

lshandross commented 9 months ago

Hi both, just wanted to add that I tested this example with the most up-to-date version of hubEnsembles and it seems like the fix in issue #32 solved it already. Let me know, though, @eahowerton if you're still having issues

eahowerton commented 9 months ago

Hi @elray1 and @lshandross, you are correct, this was resolved with issue https://github.com/Infectious-Disease-Modeling-Hubs/hubEnsembles/issues/32. I had updated the example-hub data on the software_manuscript branch and had reinstalled hubEnsembles from that branch to see it. But software_manuscript was behind main. I've merged and all is working as expected. Sorry for the confusion!