kaskr / adcomp

AD computation with Template Model Builder (TMB)
Other
176 stars 80 forks source link

Suggested option to use oneStepPredict(..., method="cdf" ) with delta-models #322

Closed James-Thorson-NOAA closed 4 years ago

James-Thorson-NOAA commented 4 years ago

Kasper and all,

As we briefly discussed by email, this pull request is my effort to provide a diff-file for a few changes that seem to provide capability for extending oneStepPredict(.) using method="cdf" to a delta-model, or other continuous distributions with a probability mass at a user-supplied set of locations (e.g., a zero-and-one inflated proportion for stomach content samples). In this case, the user supplies deltaSupport (which is NULL by default), and the limit of deltaSupport = {all supported integers} should perform identically to discrete=TRUE

I very much do not understand the statistical theory underlying oneStepPredict(.), so please review this PR with caution! However, in following the coding logic of method="cdf" there does not appear to be any fundamental distinction between discrete=TRUE and discrete=FALSE options in how they are handled on the R side, so it seems easy enough to simply provide the CDF appropriately in TMB and then evaluate as if its discrete=TRUE at those user-supplied probability-mass-locations, and discrete=FALSE at other locations where a continuous distribution applies. This then simply requires that the user correctly code a CDF for the distribution on the TMB side, which is required of the method anyway.

I have done some limited testing of this modification for a delta-model without random effects, and in this case it appeared to behave as expected, i.e., give a uniform distribution for residuals for those observations of response = 0 under the correctly specified model. However, I again emphasize that I cannot vouch for the statistical basis for the suggested modification; it's just based on my reading of its implementation.

Thanks for your time in reviewing the suggestion.

kaskr commented 4 years ago

Some preliminary comments:

The proposed PR adds an option to apply the missing randomization step in the continuous case when atoms are present. I agree that this is useful and the implementation appears to be correct.

However, it's important to keep the oneStepPredict interface as simple as possible, in particular avoid adding options that are targeting special cases. Delta distributions are important, but what about cases where the deltaSupport varies among observations?

There are already many options and I think the existing ones can be tweaked to provide the same effect as the PR. E.g. for a delta distribution one can do two passes:

## First pass. Residuals valid for 'delta suppport' only
res1 <- oneStepPredict(obj, method="cdf", discrete=TRUE)
## Second pass. Residuals valid for the rest
res2 <- oneStepPredict(obj, method="cdf", discrete=FALSE)
## Combine
resid <- ifelse(obs %in% deltaSuppport, res1$residual, res2$residual)
James-Thorson-NOAA commented 4 years ago

Good points Kasper, and thanks for your time in discussing!

What about if I submit a different PR editing the oneStepPredict doxygen doc, to include text in the Description explaining this approach for delta models, and your example code in the Example section of that doc?

This would help me and others remember how to do this for delta models

On Friday, August 14, 2020, kaskr notifications@github.com wrote:

Some preliminary comments:

The proposed PR adds an option to apply the missing randomization step in the continuous case when atoms are present. I agree that this is useful and the implementation appears to be correct.

However, it's important to keep the oneStepPredict interface as simple as possible, in particular avoid adding options that are targeting special cases. Delta distributions are important, but what about cases where the deltaSupport varies among observations?

There are already many options and I think the existing ones can be tweaked to provide the same effect as the PR. E.g. for a delta distribution one can do two passes:

First pass. Residuals valid for 'delta suppport' onlyres1 <- oneStepPredict(obj, method="cdf", discrete=TRUE)## Second pass. Residuals valid for the restres2 <- oneStepPredict(obj, method="cdf", discrete=FALSE)## Combineresid <- ifelse(obs %in% deltaSuppport, res1$residual, res2$residual)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/kaskr/adcomp/pull/322#issuecomment-673967417, or unsubscribe https://github.com/notifications/unsubscribe-auth/AL62VMWQNVWE43WIPDVGM4DSAT3J5ANCNFSM4P667XHA .

-- James Thorson

Program leader, Habitat and Ecological Processes Research (HEPR) Alaska Fisheries Science Center, NMFS Affiliate Faculty, University of Washington and Oregon State University

The contents of this message are mine personally and do not necessarily reflect any position of NOAA.

kaskr commented 4 years ago

Sounds great.

FWIW Roxygen is located around here:

https://github.com/kaskr/adcomp/blob/master/TMB/R/validation.R#L95

Alternatively it could fit in the book around here:

https://github.com/kaskr/adcomp/blob/master/dox/06-Validation.Rmd#L104