timriffe / DemoTools

Tools for the evaluation, adjustment, and standardization of demographic data
https://timriffe.github.io/DemoTools
Other
61 stars 31 forks source link

graduate_mono() returns negative value? #222

Closed sarahertog closed 3 years ago

sarahertog commented 3 years ago

Hi Tim, With the below example (Albania 1950 females, with 5-year age groups extended to 105+ with OPAG), graduate_mono() returns a negative value at age 99. But this should be impossible?

pop5 <- c(9.973927e+04, 7.532935e+04, 6.607973e+04, 5.719701e+04, 4.924090e+04, 4.262902e+04, 3.731060e+04, 3.372816e+04, 2.984707e+04, 2.582271e+04, 2.295682e+04, 2.033054e+04, 1.822981e+04, 1.540352e+04, 1.126717e+04, 7.524754e+03, 4.479483e+03, 3.653487e+03, 1.071177e+03, 1.641726e+02, 5.029706e+00, 3.877406e-01)

age5 <- seq(0,105,5)

pop1 <- graduate_mono(Value = pop5, Age = age5, AgeInt = rep(5,22), OAG = TRUE)

summary(pop1)

summary(pop1) Min. 1st Qu. Median Mean 3rd Qu. Max. -0.764 1277.839 4532.129 5868.020 8685.357 20728.971

peterdavjohnson commented 3 years ago

Hi Sara and Tim,

I was wondering about this. I think the problem is when there is a lot of fluctuation leading up to the end. Also I am not sure that polynomials can efficiently handle situations at the end of the age distribution where we assume it should be approaching zero as an asymptote. The last two points are basically 164 then 5, so the curve has to go down fast but then slow down.

I compare this to Beers and Sprague, and this definitely looks better:

[image: image.png]

For the moment should there be a post-graduation phase that looks for negatives and does something to remove them? Something like add the negative with the prior or following age or both if they are within the same 5-year group?

pj

On Tue, Mar 2, 2021 at 10:31 AM Sara Hertog notifications@github.com wrote:

Hi Tim, With the below example (Albania 1950 females, with 5-year age groups extended to 105+ with OPAG), graduate_mono() returns a negative value at age 99. But this should be impossible?

pop5 <- c(9.973927e+04, 7.532935e+04, 6.607973e+04, 5.719701e+04, 4.924090e+04, 4.262902e+04, 3.731060e+04, 3.372816e+04, 2.984707e+04, 2.582271e+04, 2.295682e+04, 2.033054e+04, 1.822981e+04, 1.540352e+04, 1.126717e+04, 7.524754e+03, 4.479483e+03, 3.653487e+03, 1.071177e+03, 1.641726e+02, 5.029706e+00, 3.877406e-01)

age5 <- seq(0,105,5)

pop1 <- graduate_mono(Value = pop5, Age = age5, AgeInt = rep(5,22), OAG = TRUE)

summary(pop1)

summary(pop1) Min. 1st Qu. Median Mean 3rd Qu. Max. -0.764 1277.839 4532.129 5868.020 8685.357 20728.971

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/timriffe/DemoTools/issues/222, or unsubscribe https://github.com/notifications/unsubscribe-auth/APAK5QTPGZE3WEWA6ZR3HTDTBUAD5ANCNFSM4YPGGLMQ .

timriffe commented 3 years ago

Wow, this one surprised me. I see that if we switch from "monoH.FC" to "hyman" then it stays positive:

graduate_mono2 <- function(
    Value,
    Age,
    AgeInt,
    OAG = TRUE) {

    if (missing(Age) & missing(AgeInt)) {
      Age                 <- names2age(Value)
    }
    if (missing(AgeInt)) {
      # give 1 to final interval to preserve
      AgeInt              <- age2int(Age, OAG = OAG, OAvalue = 1)
    }
    if (missing(Age)) {
      Age                 <- int2age(AgeInt)
    }

    # if age is single return as-is
    if (is_single(Age)) {
      names(Value) <- Age
      return(Value)
    }

    # if last age Open, we preserve it
    if (OAG) {
      N                   <- length(Value)
      OAvalue             <- Value[N]
      Value               <- Value[-N]
      Age                 <- Age[-N]
      AgeInt              <- AgeInt[-N]
    }
    # if the final age is Open, then we should remove it and then
    # stick it back on

    AgePred               <- c(min(Age), cumsum(AgeInt))
    y                     <- c(0, cumsum(Value))
    AgeS                  <- min(Age):sum(AgeInt)
    y1                    <- splinefun(y ~ AgePred, method = "hyman")(AgeS)
    out                   <- diff(y1)
    names(out)            <- AgeS[-length(AgeS)]

    # The open age group is maintained as-is.
    if (OAG) {
      out                 <- c(out, OAvalue)
    }
    age1 <- min(Age):(min(Age) + length(out) - 1)
    names(out) <- age1
    out
  }

  pop5 <- c(9.973927e+04, 7.532935e+04, 6.607973e+04, 5.719701e+04, 4.924090e+04, 4.262902e+04, 3.731060e+04, 3.372816e+04, 2.984707e+04,
            2.582271e+04, 2.295682e+04, 2.033054e+04, 1.822981e+04, 1.540352e+04, 1.126717e+04, 7.524754e+03, 4.479483e+03, 3.653487e+03,
            1.071177e+03, 1.641726e+02, 5.029706e+00, 3.877406e-01)

 age5 <- seq(0,105,5)

 pop1 <- graduate_mono2(Value = pop5, Age = age5, AgeInt = rep(5,22), OAG = TRUE)
range(pop1)
plot(age5,pop5/5,type='s')
 lines(0:105,pop1)

This R query hits the topic: https://stat.ethz.ch/pipermail/r-help/2010-February/228387.html

Maybe it's just as simple as switching to "hyman"?

peterdavjohnson commented 3 years ago

Tim made the change to the method="hyman" (version 01.13.32). Attached is a comparison at the older and younger ages that seems to indicate that this is much better.

pj

On Wed, Mar 3, 2021 at 9:09 AM Tim Riffe notifications@github.com wrote:

Closed #222 https://github.com/timriffe/DemoTools/issues/222 via 8efbc78 https://github.com/timriffe/DemoTools/commit/8efbc78c3131b3834f7bc305e5635e7e2d7f6ffb .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/timriffe/DemoTools/issues/222#event-4402221487, or unsubscribe https://github.com/notifications/unsubscribe-auth/APAK5QXYISD5DC5PJGHQRRDTBY7KLANCNFSM4YPGGLMQ .

sarahertog commented 3 years ago

Thanks!!