Closed jeffeaton closed 2 years ago
Feedback from @rlglaubius on DemProj discrepancies:
When Spectrum calculates births, it divides the PASFR(t,a) by the sum over all ages. I think Leapfrog does not include the normalization term. This is an issue because (e.g.) PASFRs in the example Botswana file sum to 100.0003 in 1971, so there are slightly more births in Leapfrog than Spectrum that year.
Next, I removed the PASFR normalization term from Spectrum. The graph is still spikey, but on a much more satisfying scale:
I believe the remaining difference is because births are truncated to five decimal places in the .DP file, whereas R has access to Leapfrog outputs without truncation.
Disaggregating ages 0-4: Below, I’ve copied the code Spectrum uses to disaggregate net migration and base-year population from five-year age groups to single ages. Inputs p1, p2, …, p5 correspond to net migration counts for ages 0-4, 5-9, …, 20-24; a0, a1, …, a4 are the resulting single-age values. If we were using Beers’ method, the code would be:
procedure SplitFirstGroupOld(p1, p2, p3, p4, p5 : double; var a0, a1, a2, a3, a4 : double);
begin
a0 := 0.3333*p1 - 0.1636*p2 - 0.0210*p3 + 0.0796*p4 - 0.0283*p5;
a1 := 0.2595*p1 - 0.0780*p2 + 0.0130*p3 + 0.0100*p4 - 0.0045*p5;
a2 := 0.1924*p1 + 0.0064*p2 + 0.0184*p3 - 0.0256*p4 + 0.0084*p5;
a3 := 0.1329*p1 + 0.0844*p2 + 0.0054*p3 - 0.0356*p4 + 0.0129*p5;
a4 := 0.0819*p1 + 0.1508*p2 - 0.0158*p3 - 0.0284*p4 + 0.0115*p5;
end;
Instead, we are using base-year Sx values for ages 0…4, as returned by DP.GetSurvRate (DP.GetSurvRate(t,s,0) is survival after birth, so DP.GetSurvRate(t,s,a) for a>0 corresponds year-to-year survival for people aged a-1 at start-of-year):
procedure SplitFirstGroup(GB : TGBVariables; DP:TDPData;p1, p2, p3, p4, p5 : double; sex : byte; var a0, a1, a2, a3, a4 : double);
Var
aSum : double;
begin
if GB.GetInCalcStateMode(DP.proj) then // RG: Jeff, you can ignore this block, since it’s for projections starting after 1985
begin
a0 := DP.GetSurvRate1970(sex,1) * 2 * 1000;
a1 := DP.GetSurvRate1970(sex,2) * a0;
a2 := DP.GetSurvRate1970(sex,3) * a1;
a3 := DP.GetSurvRate1970(sex,4) * a2;
a4 := DP.GetSurvRate1970(sex,5) * a3;
end
else // DP_FIRST_INDEX is the first year of the projection
begin
a0 := DP.GetSurvRate(DP_FIRST_INDEX,sex,1) * 2 * 1000;
a1 := DP.GetSurvRate(DP_FIRST_INDEX,sex,2) * a0;
a2 := DP.GetSurvRate(DP_FIRST_INDEX,sex,3) * a1;
a3 := DP.GetSurvRate(DP_FIRST_INDEX,sex,4) * a2;
a4 := DP.GetSurvRate(DP_FIRST_INDEX,sex,5) * a3;
end;
aSum := a0 + a1 + a2 + a3 + a4;
if aSum > 0 then
begin
a0 := a0 / aSum * p1;
a1 := a1 / aSum * p1;
a2 := a2 / aSum * p1;
a3 := a3 / aSum * p1;
a4 := a4 / aSum * p1;
end
else
begin
a0 := 0.3333*p1 - 0.1636*p2 - 0.0210*p3 + 0.0796*p4 - 0.0283*p5;
a1 := 0.2595*p1 - 0.0780*p2 + 0.0130*p3 + 0.0100*p4 - 0.0045*p5;
a2 := 0.1924*p1 + 0.0064*p2 + 0.0184*p3 - 0.0256*p4 + 0.0084*p5;
a3 := 0.1329*p1 + 0.0844*p2 + 0.0054*p3 - 0.0356*p4 + 0.0129*p5;
a4 := 0.0819*p1 + 0.1508*p2 - 0.0158*p3 - 0.0284*p4 + 0.0115*p5;
end;
end;
Now, this only comes up because Spectrum (1) reads 1x1 net migration from a UPD file, (2) aggregates migrants to five-year age groups, (3) factors migrants into overall net migrant numbers and age patterns for the DemProj input editor, then (4) disaggregates those inputs back to single ages during projection. Maybe now is the time to consider just using the WPP 1x1 estimates directly instead of going through this lossy aggregation/disaggregation process? Changing input editors and maintaining backwards compatibility with existing files seem like the two biggest hurdles.
Births discrepancy is resolved in 0ac9ea9 by normalizing ASFR input as described by @rlglaubius in previous comment. Test on DemProj births and population comparison made more stringent.
Under 5 net migration discrepancy is resolved in f510ebd and included in test.
In the DemProj only projection with net migration, the discrepancy in 80+ is slightly larger than other ages (up to about 0.5):
devtools::load_all()
pjnz1 <- "tests/testdata/spectrum/v6.13/bwa_demproj-only_spectrum-v6.13_2022-02-12.PJNZ"
demp1 <- prepare_leapfrog_demp(pjnz1)
hivp1 <- prepare_leapfrog_projp(pjnz1)
specres1 <- eppasm::read_hivproj_output(pjnz1)
lsim1 <- leapfrogR(demp1, hivp1)
diff80plus <- lsim1$totpop1[81,,] - specres1$totpop[81,,]
png("~/Downloads/diff80pl.png", h = 4.5, w = 6, units = "in", res = 180)
matplot(1970:2030, t(diff80plus), type = "l", lty = 1,
main = "Difference in 80+ population: leapfrog - spectrum")
abline(h = 0, lty = 2)
legend("topleft", c("male", "female"), col = 1:2, lty = 1)
dev.off()
This might come from subtle difference in how the adjustment to the age group where net migrations is calculated or how migrations and survivorship are done in the open age group.
I am inclined to call this 'good enough' though and close this. What do you think @rlglaubius?
The difference apparently has to do with the survival ratios applied to net migrants in the open-ended age group. Leapfrog applies the 80+ survival ratio, whereas Spectrum uses the average of 79 and 80+ survival ratios weighted by population sizes. To test, I made the following changes to leapfrog-raw.h:
--- a/inst/include/leapfrog-raw.h
+++ b/inst/include/leapfrog-raw.h
@@ -97,10 +97,15 @@ template <typename Type, int NG, int pAG, int pIDX_FERT, int pAG_FERT>
totpop1(pAG-1, g, t) += totpop1(pAG-1, g, t-1) - natdeaths_open_age;
// net migration
- for(int a = 1; a < pAG; a++) {
+ for(int a = 1; a < pAG - 1; a++) {
migrate_ag(a, g) = netmigr(a, g, t) * (1.0 + sx(a, g, t)) * 0.5 / totpop1(a, g, t);
totpop1(a, g, t) *= 1.0 + migrate_ag(a, g);
}
+
+ int a = pAG - 1;
+ Type numer = totpop1(a, g, t-1) * (1.0 + sx(a+1, g, t)) + totpop1(a-1, g, t-1) * (1.0 + sx(a, g, t));
+ Type denom = totpop1(a, g, t-1) + totpop1(a-1, g, t-1);
+ totpop1(a, g, t) += netmigr(a, g, t) * 0.5 * numer / denom;
}
// fertility
After making this change, the 80+ population difference is much smaller:
My code changes are more verbose than necessary, and could probably be optimized by using some products of the preceding calculations.
Hi @rlglaubius -- thanks, that's awesome.
And yes, you're right that it can be expressed in terms of the previous calculations for totpop1
and natdeaths
in the open age group:
The calculation could be unified without needing to handle the open age group as a special case if we used that same expression here: https://github.com/mrc-ide/leapfrog/blob/0c985624f87ee9f3691b2354398b6e8f85d621a2/inst/include/leapfrog-raw.h#L193
But I've just left it with the special case as maybe that is useful for clarity.
Closing this issue, and moving on to AIM components!
There are a couple of discrepancies with demographic projection only simulation:
The demographic projection in leapfrog simulation matches EPP-ASM up to age 79. I was surprised that there was a difference between EPP-ASM and DemProj because I recall checking this quite carefully; but nonetheless.
_Note, leapfrog does not match EPP-ASM for age 80+ because EPP-ASM did not implement the survivorship for the age 79 / age 80+ groups correctly. This is fixed in leapfrog._
Difference in births per year
Here's some code to reproduce the difference in births per year calculated from Leapfrog and Spectrum.
Difference in net migration age 0-5
In PJNZ file with net migration, but no AIM simulation, there is a difference in the population in age 0-5 years between the Spectrum simulation and Leapfrog simulation.
John and Rob mentioned that Spectrum disaggregates the age 0-4 net migration slightly differently.
Difference between Leapfrog and Spectrum simulation in year 2: