robjhyndman / demography

demography package for R
https://pkg.robjhyndman.com/demography
73 stars 25 forks source link

LCA in-sample values are inconsistent with restype - if chooseperiod equals true #56

Open HoxD opened 9 months ago

HoxD commented 9 months ago

Hi Prof. Hyndman

I want to raise a concern regarding the fitted values (and subsequent residuals) calculated by demography::lca. If chooseperiod is set to TRUE in the original function call then the fitted values of the output will always be in terms of log death rates - regardless of the restype (response type) parameter specified.

A snippet from the relevant R code below illustrates the issue. If chooseperiod is set to TRUE then the function uses the specified break method to identify the appropriate fitting period - it returns a LCA model of the corresponding period but with chooseperiod set to FALSE. Under both break methods, the response type parameter is not inherited from the original function call and will therefore assume its default value - log death rates. This creates an inconsistency in the nature of the fitted values (and residuals) returned versus those specified in the model. This is particularly problematic if users of this package want to compare in-sample values between various models - not in terms of log death rates.

   if(chooseperiod)
    {
        if(breakmethod=="bai")
        {
            x <- 1:m
            # Find breakpoints
            bp <- strucchange::breakpoints(kt ~ x)$breakpoints
            # Omit breakpoints less than minperiod from end
            bp <- bp[bp <= (m-minperiod)]
            bestbreak <- max(bp)
            return(lca(data,series,year[(bestbreak+1):m],ages=ages,max.age=max.age,
                adjust=adjust,interpolate=interpolate,chooseperiod=FALSE,scale=scale))
        }
        else
        {
            RS <- devlin <- devadd <- numeric(m-2)
            for(i in 1:(m-2))
            {
                tmp <- lca(data,series,year[i:m],ages=ages,max.age=max.age,adjust=adjust,chooseperiod=FALSE,interpolate=interpolate,scale=scale)
                devlin[i] <- tmp$mdev[2]
                devadd[i] <- tmp$mdev[1]
                RS[i] <- (tmp$mdev[2]/tmp$mdev[1])
            }
            bestbreak <- order(RS[1:(m-minperiod)])[1]-1
            out <- lca(data,series,year[(bestbreak+1):m],ages=ages,max.age=max.age,
                adjust=adjust,chooseperiod=FALSE,interpolate=interpolate,scale=scale)
            out$mdevs <- ts(cbind(devlin,devadd,RS),start=startyear,deltat=deltat)
            dimnames(out$mdevs)[[2]] <- c("Mean deviance total","Mean deviance base","Mean deviance ratio")
            return(out)
        }
    }

Fortunately, this issue only affects in-sample values. All of the forecasting and comparison functions within the package will be based on out-sample data and ignore the in-sample residuals. Furthermore, the solution would appear to be as simple as adding the response type parameter to the models returned - so that it is inherited from the original function call.

This is the first instance of this issue that I encountered and hopefully the last. However, I would recommend revising the parameter specification of all recursive functions so that they are consistent with the original function call.

I hope I explained my concern well enough. Please let me know if I can provide any additional information to clarify my concern.

Regards

robjhyndman commented 9 months ago

Are you able to provide a PR to fix the problem?