mrc-ide / leapfrog

Multistate population projection model for demographic estimation.
Other
2 stars 5 forks source link

discrepancy in allocation of new infections by age Spectrum and EPP-ASM / leapfrog #18

Open jeffeaton opened 2 years ago

jeffeaton commented 2 years ago

There is a small difference in the allocation of new infections by age between Spectrum and EPP-ASM / leapfrog.

From interrogating, I believe the discrepancy arises from some difference about how age groups are handled where the Beers interpolation of the age IRRs goes <0.

In EPP-ASM, the IRRs for age groups where the Beers interpolation goes <0 are simply set = 0: https://github.com/mrc-ide/eppasm/blob/206139f764ffd787a24fcb3580ec5fc80cc0e6d3/R/spectrum.R#L135-L137

Amat <- create_beers(17) 
fp$incrr_age <- apply(projp$incrr_age, 2:3, function(x)  Amat %*% x)[AGE_START + 1:pAG, , as.character(proj_start:proj_end)] 
fp$incrr_age[fp$incrr_age < 0] <- 0 

In Spectrum, these same age groups get 0 infections, but the number of infections allocated in surrounding age groups is quite different.

I have created a test example here: bwa_age-45to49-infections-in-1980_spectrum-v6.13_2022-08-05.PJNZ.zip

In this file:

The same number of infections in age 15-49 by sex, but the distribution over ages is different.


Code to reproduce example

Use file: bwa_age-45to49-infections-in-1980_spectrum-v6.13_2022-08-05.PJNZ.zip

library(eppasm)
library(dplyr)

pjnz <- "bwa_age-45to49-infections-in-1980_spectrum-v6.13_2022-08-05.PJNZ.zip"

fp <- eppasm::prepare_directincid(pjnz)
fp$tARTstart <- fp$SIM_YEARS

mod <- eppasm::simmod(fp)
mod <- eppasm::spec_add_dimnames(mod, fp)

specres <- eppasm::read_hivproj_output(pjnz)

infections <- specres$infections %>%
  as.data.frame.table(responseName = "spectrum") %>%
  type.convert(as.is = TRUE) %>%
  left_join(
    attr(mod, "infections") %>%
      as.data.frame.table(responseName = "eppasm") %>%
      type.convert(as.is = TRUE) 
  ) 

infections %>%
  filter(year == 1980,
         age %in% 24:50) %>%
  mutate(ratio = spectrum / eppasm)

Output

This difference is much more extreme than we see when there are a reasonable pattern of incidence rate ratios; but an extreme example is useful for debugging.

> infections %>%
+   filter(year == 1980,
+          age %in% 24:50) %>%
+   mutate(ratio = spectrum / eppasm)
   age    sex year   spectrum     eppasm     ratio
1   24   male 1980    0.00000    0.00000       NaN
2   25   male 1980  129.99210   74.74849 1.7390598
3   26   male 1980  272.51626  156.27598 1.7438141
4   27   male 1980  214.29991  119.66645 1.7908103
5   28   male 1980    0.00000    0.00000       NaN
6   29   male 1980    0.00000    0.00000       NaN
7   30   male 1980    0.00000    0.00000       NaN
8   31   male 1980    0.00000    0.00000       NaN
9   32   male 1980    0.00000    0.00000       NaN
10  33   male 1980  389.60890  251.45609 1.5494112
11  34   male 1980 1638.02303 1287.51496 1.2722361
12  35   male 1980 2595.00198 2575.22479 1.0076798
13  36   male 1980 3122.76515 3641.58336 0.8575295
14  37   male 1980 3286.30565 3989.53221 0.8237321
15  38   male 1980 3084.95122 3424.61004 0.9008182
16  39   male 1980 2518.35118 2360.08636 1.0670589
17  40   male 1980 1562.98778 1170.23255 1.3356215
18  41   male 1980  368.21554  223.48821 1.6475838
19  42   male 1980    0.00000    0.00000       NaN
20  43   male 1980    0.00000    0.00000       NaN
21  44   male 1980    0.00000    0.00000       NaN
22  45   male 1980    0.00000    0.00000       NaN
23  46   male 1980    0.00000    0.00000       NaN
24  47   male 1980   82.67258   47.32636 1.7468611
25  48   male 1980   92.38474   53.31206 1.7329051
26  49   male 1980   39.41035   22.42946 1.7570795
27  50   male 1980    0.00000    0.00000       NaN
28  24 female 1980    0.00000    0.00000       NaN
29  25 female 1980   87.05366   59.71760 1.4577556
30  26 female 1980  184.90939  126.01382 1.4673739
31  27 female 1980  146.58574   97.38402 1.5052341
32  28 female 1980    0.00000    0.00000       NaN
33  29 female 1980    0.00000    0.00000       NaN
34  30 female 1980    0.00000    0.00000       NaN
35  31 female 1980    0.00000    0.00000       NaN
36  32 female 1980    0.00000    0.00000       NaN
37  33 female 1980  288.04033  217.33532 1.3253268
38  34 female 1980 1307.30432 1118.22284 1.1690911
39  35 female 1980 2249.78667 2245.63219 1.0018500
40  36 female 1980 2877.93756 3191.17390 0.9018429
41  37 female 1980 3110.17114 3521.98399 0.8830736
42  38 female 1980 2879.29385 3051.99550 0.9434135
43  39 female 1980 2241.93076 2121.88798 1.0565736
44  40 female 1980 1298.94355 1060.66169 1.2246540
45  41 female 1980  287.24935  204.40368 1.4053042
46  42 female 1980    0.00000    0.00000       NaN
47  43 female 1980    0.00000    0.00000       NaN
48  44 female 1980    0.00000    0.00000       NaN
49  45 female 1980    0.00000    0.00000       NaN
50  46 female 1980    0.00000    0.00000       NaN
51  47 female 1980   68.22544   46.30775 1.4733049
52  48 female 1980   77.06732   52.52446 1.4672652
53  49 female 1980   32.98745   22.24160 1.4831417
54  50 female 1980    0.00000    0.00000       NaN
rlglaubius commented 2 years ago

Thanks for creating the clean test case, that helped isolate what was going on quickly! I verified that we are disaggregating IRRs to single age the same way. The differences between EPP-ASM and Spectrum come in later. First, your code that calculates infections_ts would need to be inside the timestep loop. This is because of the second difference: while calculating new infections by age, when Spectrum multiplies the age-IRRs by the HIV-negative population (i.e., when you calculate Xhivn_incagerr and infections_ts) the current-year population is used instead of the previous-year population.

I implemented and tested these changes in an eppasm fork (https://github.com/rlglaubius/eppasm/blob/NewHIVByAge/src/eppasm.cpp). I haven’t made a pull request since I’m not confident of any knock-on changes needed since this is done conditional on eppmod, and some quantities (Xhivn) could still be correctly calculated outside the timestep loop.

Here is what I see after implementing those changes:

> infections %>%
+   filter(year == 1980,
+          age %in% 24:50) %>%
+   mutate(ratio = spectrum / eppasm)
   age    sex year   spectrum     eppasm     ratio
1   24   male 1980    0.00000    0.00000       NaN
2   25   male 1980  129.99210  129.99213 0.9999998
3   26   male 1980  272.51626  272.51632 0.9999998
4   27   male 1980  214.29991  214.29996 0.9999998
5   28   male 1980    0.00000    0.00000       NaN
6   29   male 1980    0.00000    0.00000       NaN
7   30   male 1980    0.00000    0.00000       NaN
8   31   male 1980    0.00000    0.00000       NaN
9   32   male 1980    0.00000    0.00000       NaN
10  33   male 1980  389.60890  389.60894 0.9999999
11  34   male 1980 1638.02303 1638.02301 1.0000000
12  35   male 1980 2595.00198 2595.00191 1.0000000
13  36   male 1980 3122.76515 3122.76556 0.9999999
14  37   male 1980 3286.30565 3286.30575 1.0000000
15  38   male 1980 3084.95122 3084.95130 1.0000000
16  39   male 1980 2518.35118 2518.35131 0.9999999
17  40   male 1980 1562.98778 1562.98788 0.9999999
18  41   male 1980  368.21554  368.21561 0.9999998
19  42   male 1980    0.00000    0.00000       NaN
20  43   male 1980    0.00000    0.00000       NaN
21  44   male 1980    0.00000    0.00000       NaN
22  45   male 1980    0.00000    0.00000       NaN
23  46   male 1980    0.00000    0.00000       NaN
24  47   male 1980   82.67258   82.67258 0.9999999
25  48   male 1980   92.38474   92.38475 0.9999998
26  49   male 1980   39.41035   39.41036 0.9999997
27  50   male 1980    0.00000    0.00000       NaN
28  24 female 1980    0.00000    0.00000       NaN
29  25 female 1980   87.05366   87.05366 1.0000000
30  26 female 1980  184.90939  184.90942 0.9999998
31  27 female 1980  146.58574  146.58576 0.9999998
32  28 female 1980    0.00000    0.00000       NaN
33  29 female 1980    0.00000    0.00000       NaN
34  30 female 1980    0.00000    0.00000       NaN
35  31 female 1980    0.00000    0.00000       NaN
36  32 female 1980    0.00000    0.00000       NaN
37  33 female 1980  288.04033  288.04036 0.9999999
38  34 female 1980 1307.30432 1307.30451 0.9999999
39  35 female 1980 2249.78667 2249.78636 1.0000001
40  36 female 1980 2877.93756 2877.93743 1.0000000
41  37 female 1980 3110.17114 3110.17045 1.0000002
42  38 female 1980 2879.29385 2879.29424 0.9999999
43  39 female 1980 2241.93076 2241.93075 1.0000000
44  40 female 1980 1298.94355 1298.94381 0.9999998
45  41 female 1980  287.24935  287.24935 1.0000000
46  42 female 1980    0.00000    0.00000       NaN
47  43 female 1980    0.00000    0.00000       NaN
48  44 female 1980    0.00000    0.00000       NaN
49  45 female 1980    0.00000    0.00000       NaN
50  46 female 1980    0.00000    0.00000       NaN
51  47 female 1980   68.22544   68.22544 1.0000000
52  48 female 1980   77.06732   77.06732 1.0000000
53  49 female 1980   32.98745   32.98745 1.0000000
54  50 female 1980    0.00000    0.00000       NaN
jeffeaton commented 2 years ago

Thanks very much @rlglaubius. I have implemented this in the EPP-ASM code base: https://github.com/mrc-ide/eppasm/pull/32, and in the leapfrog code (#20).

In EPP-ASM, I did as you suggested and calculated Xhivn[g] and incrate_g[g] (incidence rate 15-49 by sex) outside of the per-time step loop.

It feels slightly incongruous to me that total infections and incidence by sex is calculated based on the population at t-1 and incidence by age is calculated at time t. But not strongly enough to suggest changing now.

Later we might want to loop back to consider if a different handling might give better correspondence with EPP. In EPP-ASM when infections are calculated based on r(t), we took some care to adjust the 15-49 population each time step to avoid jagged incidence pattern.