Open jeffeaton opened 2 years ago
Thanks for creating the clean test case, that helped isolate what was going on quickly! I verified that we are disaggregating IRRs to single age the same way. The differences between EPP-ASM and Spectrum come in later. First, your code that calculates infections_ts would need to be inside the timestep loop. This is because of the second difference: while calculating new infections by age, when Spectrum multiplies the age-IRRs by the HIV-negative population (i.e., when you calculate Xhivn_incagerr and infections_ts) the current-year population is used instead of the previous-year population.
I implemented and tested these changes in an eppasm fork (https://github.com/rlglaubius/eppasm/blob/NewHIVByAge/src/eppasm.cpp). I haven’t made a pull request since I’m not confident of any knock-on changes needed since this is done conditional on eppmod, and some quantities (Xhivn) could still be correctly calculated outside the timestep loop.
Here is what I see after implementing those changes:
> infections %>%
+ filter(year == 1980,
+ age %in% 24:50) %>%
+ mutate(ratio = spectrum / eppasm)
age sex year spectrum eppasm ratio
1 24 male 1980 0.00000 0.00000 NaN
2 25 male 1980 129.99210 129.99213 0.9999998
3 26 male 1980 272.51626 272.51632 0.9999998
4 27 male 1980 214.29991 214.29996 0.9999998
5 28 male 1980 0.00000 0.00000 NaN
6 29 male 1980 0.00000 0.00000 NaN
7 30 male 1980 0.00000 0.00000 NaN
8 31 male 1980 0.00000 0.00000 NaN
9 32 male 1980 0.00000 0.00000 NaN
10 33 male 1980 389.60890 389.60894 0.9999999
11 34 male 1980 1638.02303 1638.02301 1.0000000
12 35 male 1980 2595.00198 2595.00191 1.0000000
13 36 male 1980 3122.76515 3122.76556 0.9999999
14 37 male 1980 3286.30565 3286.30575 1.0000000
15 38 male 1980 3084.95122 3084.95130 1.0000000
16 39 male 1980 2518.35118 2518.35131 0.9999999
17 40 male 1980 1562.98778 1562.98788 0.9999999
18 41 male 1980 368.21554 368.21561 0.9999998
19 42 male 1980 0.00000 0.00000 NaN
20 43 male 1980 0.00000 0.00000 NaN
21 44 male 1980 0.00000 0.00000 NaN
22 45 male 1980 0.00000 0.00000 NaN
23 46 male 1980 0.00000 0.00000 NaN
24 47 male 1980 82.67258 82.67258 0.9999999
25 48 male 1980 92.38474 92.38475 0.9999998
26 49 male 1980 39.41035 39.41036 0.9999997
27 50 male 1980 0.00000 0.00000 NaN
28 24 female 1980 0.00000 0.00000 NaN
29 25 female 1980 87.05366 87.05366 1.0000000
30 26 female 1980 184.90939 184.90942 0.9999998
31 27 female 1980 146.58574 146.58576 0.9999998
32 28 female 1980 0.00000 0.00000 NaN
33 29 female 1980 0.00000 0.00000 NaN
34 30 female 1980 0.00000 0.00000 NaN
35 31 female 1980 0.00000 0.00000 NaN
36 32 female 1980 0.00000 0.00000 NaN
37 33 female 1980 288.04033 288.04036 0.9999999
38 34 female 1980 1307.30432 1307.30451 0.9999999
39 35 female 1980 2249.78667 2249.78636 1.0000001
40 36 female 1980 2877.93756 2877.93743 1.0000000
41 37 female 1980 3110.17114 3110.17045 1.0000002
42 38 female 1980 2879.29385 2879.29424 0.9999999
43 39 female 1980 2241.93076 2241.93075 1.0000000
44 40 female 1980 1298.94355 1298.94381 0.9999998
45 41 female 1980 287.24935 287.24935 1.0000000
46 42 female 1980 0.00000 0.00000 NaN
47 43 female 1980 0.00000 0.00000 NaN
48 44 female 1980 0.00000 0.00000 NaN
49 45 female 1980 0.00000 0.00000 NaN
50 46 female 1980 0.00000 0.00000 NaN
51 47 female 1980 68.22544 68.22544 1.0000000
52 48 female 1980 77.06732 77.06732 1.0000000
53 49 female 1980 32.98745 32.98745 1.0000000
54 50 female 1980 0.00000 0.00000 NaN
Thanks very much @rlglaubius. I have implemented this in the EPP-ASM code base: https://github.com/mrc-ide/eppasm/pull/32, and in the leapfrog code (#20).
In EPP-ASM, I did as you suggested and calculated Xhivn[g]
and incrate_g[g]
(incidence rate 15-49 by sex) outside of the per-time step loop.
It feels slightly incongruous to me that total infections and incidence by sex is calculated based on the population at t-1
and incidence by age is calculated at time t
. But not strongly enough to suggest changing now.
Later we might want to loop back to consider if a different handling might give better correspondence with EPP. In EPP-ASM when infections are calculated based on r(t), we took some care to adjust the 15-49 population each time step to avoid jagged incidence pattern.
There is a small difference in the allocation of new infections by age between Spectrum and EPP-ASM / leapfrog.
From interrogating, I believe the discrepancy arises from some difference about how age groups are handled where the Beers interpolation of the age IRRs goes <0.
In EPP-ASM, the IRRs for age groups where the Beers interpolation goes <0 are simply set = 0: https://github.com/mrc-ide/eppasm/blob/206139f764ffd787a24fcb3580ec5fc80cc0e6d3/R/spectrum.R#L135-L137
In Spectrum, these same age groups get 0 infections, but the number of infections allocated in surrounding age groups is quite different.
I have created a test example here: bwa_age-45to49-infections-in-1980_spectrum-v6.13_2022-08-05.PJNZ.zip
In this file:
The same number of infections in age 15-49 by sex, but the distribution over ages is different.
Code to reproduce example
Use file: bwa_age-45to49-infections-in-1980_spectrum-v6.13_2022-08-05.PJNZ.zip
Output
This difference is much more extreme than we see when there are a reasonable pattern of incidence rate ratios; but an extreme example is useful for debugging.