nmfs-ost / ss3-source-code

The source code for Stock Synthesis (SS3).
https://nmfs-ost.github.io/ss3-website/
Creative Commons Zero v1.0 Universal
36 stars 16 forks source link

Add log(Pval) term to lognormal prior #558

Closed quang-huynh closed 6 months ago

quang-huynh commented 6 months ago

Concisely describe what has been changed/addressed in the pull request.

Add log(pval) to negative log prior for lognormal distribution

What tests have been done?

Where are the relevant files?

What tests/review still need to be done?

Is there an input change for users to Stock Synthesis?

No input change.

Additional information (optional).

Current negative log prior is of the form 0.5 * square(log(x) - mu). Since x is estimated in normal space, an additional term log(x) from the lognormal distribution is needed.

Some R code to verify:

x <- seq(0.01, 1, 0.005)
logmu <- 0
sd <- 1

# True mode of lognormal distribution
f_x <- dlnorm(x, logmu, sd)
x[which.max(f_x)]

# Discrepancy in the mode of the distribution with SS3 lognormal prior
g_x <- -0.5 * (log(x) - logmu)^2/sd/sd
x[which.max(g_x)]

# Mode from proposed correction matches the answer
h_x <- -0.5 * (log(x) - logmu)^2/sd/sd - log(x)
x[which.max(h_x)]
Cole-Monnahan-NOAA commented 6 months ago

My initial impression is to agree with this request from a statistical standpoint. When used as a prior, the pdf is a function of the state variable which in this case is some parameter say 'p'. The mu and sd terms are fixed (inputted by the user). The log(p) component thus varies as p does during optimization (or MCMC) and thus the prior density needs it to be accurate.

@quang-huynh 's code shows this, but another way to demonstrate it is to integrate the model. With no other sources of information the posterior will just be the prior. So I modified the example to get posteriors and compare to the analytical prior. See code below which produces the following plot.

I think this could be wrapped into issue https://github.com/nmfs-ost/ss3-source-code/issues/553 which will overhaul the objective function calculations. This is a good example of why using dlnorm would be preferable. Probably the shortest path forward is to provide a new prior option that uses dlnorm and leave the old ones in with a note in the documentation.

Curious to hear if there are other considerations.

image

library(adnuts)

tpl <- '
DATA_SECTION
  init_number logmu;
  init_number sd;

PARAMETER_SECTION
  init_bounded_number p1(0,500);
  init_bounded_number p2(0,500);
  init_bounded_number p3(0,500);
  objective_function_value jnll;

PROCEDURE_SECTION
  jnll=  0.5*square(log(p1)-logmu)/square(sd);
  jnll += dlnorm(p2,logmu,sd);
  jnll += 0.5*square(log(p3)-logmu)/square(sd)+log(p3);
'

logmu <- 1.1
logsd <- 1.12
writeLines(as.character(c(logmu, sd)), con='lnpriors.dat')
## prior mean and SD in log space
writeLines(tpl, con='lnpriors.tpl')
system("admb lnpriors.tpl")

## sample lots of draws for clarity
post <- sample_nuts('lnpriors', cores=1,
                   init=function() c(1,1,1)) |>
  as.data.frame()

par(mfcol=c(2,3))
labs <- c('SS3', 'dlnorm', 'SS3 corrected')
logp <- seq(qnorm(.001, logmu, logsd), qnorm(.999, logmu,logsd), len=1000)
p <- exp(logp)
for(i in 1:3){
  hist(post[,i], freq=FALSE, breaks=30,
       main=paste(labs[i], 'natural space'), xlab='p')
  lines(p, dlnorm(p,logmu,logsd), lwd=2)
  hist(log(post[,i]), freq=FALSE, breaks=30,
       main=paste(labs[i], 'log space'), xlab='log p')
  lines(log(p), dnorm(log(p),logmu,logsd), lwd=2)
}
Rick-Methot-NOAA commented 6 months ago

closing this PR and redirecting discussion to #553

quang-huynh commented 6 months ago

The explanation from @Cole-Monnahan-NOAA is much better than my terse code!

On the math side, if we take the log of the lognormal probability density function and drop constants, then 0.5*square(log(x)-logmu)/square(sd) is fine as a likelihood (x is a data point and logmu is a model prediction), but the log(x) term can't be dropped if the pdf is used as a prior (x is estimated, logmu is fixed).

Rick-Methot-NOAA commented 6 months ago

I understand, but SS3 is rarely used in a Bayesian mode and the value of 0.5*square(log(parm)-logmu)/square(sd) is simply a component of the objective function. So, does the correct approach depend upon whether the model is operating in bayesian mode vs maximum posterior density mode? SS3 was designed as a MPD model and has never been thoroughly examined for bayesian use, as Cole (and Jim Thorson) has proposed for #553

quang-huynh commented 6 months ago

No, I don't think the approach changes with MPD vs. MCMC.

The current SS3 equation generates a larger prior mode and SD than what the user specifies (both my and Cole's figures show this). This has been my experience with a couple assessments where the posterior mean and MPD for natural mortality was much higher than the prior mean for what I thought would be a relatively uninformative data set.

image

x <- seq(0.01, 3, 0.01)
logmu <- 0
sd <- 1

# True mode of lognormal distribution
f_x <- dlnorm(x, logmu, sd, log = TRUE)
x[which.max(f_x)]

# Discrepancy in the mode of the distribution with SS3 lognormal prior
g_x <- -0.5 * (log(x) - logmu)^2/sd/sd
x[which.max(g_x)]

# Mode from proposed correction matches the answer
h_x <- -0.5 * (log(x) - logmu)^2/sd/sd - log(x)
x[which.max(h_x)]

plot(x, f_x - max(f_x), ylab = "Prior density function", xlab = "Parameter value", ylim = c(-2, 0))
lines(x, g_x - max(g_x), col = 2)
lines(x, h_x - max(h_x), col = 3)
legend('bottomright', c("Lognormal distribution", "Current SS3", "SS3 corrected"), col = 1:3, pch = c(1, NA, NA), lty = 1)
Cole-Monnahan-NOAA commented 6 months ago

I agree that it does not depend on how the objective function will be used. The fundamental problem is that it's putting a normal prior in log space, but the manual suggests it's a lognormal prior. So the user will get a different answer than the expect, even if they are optimizing. You can see this from my example where those independent parameters the MLE will just be the mode which you can visualize, and the current one (on the left) would give a biased MLE for the parameter without any data. I turned on estimation in the above example and got

# Number of parameters = 3 Objective function value = 1.97786721851170  Maximum gradient component = 4.38275630568530e-06
# p1:
3.00416521397
# p2:
0.856929171346
# p3:
0.856929162964

The mode of this distribution (meanlog=1.1, meansd=1.12) is exp(mu-sd^2) = 0.8569292. So you can see that the estimate will be off. It's not convening the right information in the prior that the user expects.

That's my interpretation at least.