Closed mbrubake closed 11 years ago
@bob-carpenter this has implications for issue #110 since a user will want to differentiate between increments due to a parameter transform and increments expressing a part of the posterior.
On 6/8/13 12:15 PM, Marcus Brubaker wrote:
When doing optimization, the log determinant terms that result from transformed parameters are unnecessary and will actually move the mode. We need to be able to turn off these terms when computing log_prob() and grad_log_prob() for optimization methods.
I don't understand. They're part of the log probability functions. If you don't want uniform priors on parameters, that can be changed in the model.
Let me work through an example to see if I'm following. If I write
parameters {
real
then the density over the unconstrained parameter defined by the log prob function theta_unc in (-inf,inf) is log p(theta_unc | 3.9, 17.2), where
p(theta_unc | 3.9, 17.2)
= inv_logit(theta_unc) * (1 - inv_logit(theta_unc)) // Jacobian
* beta(inv_logit(theta_unc) | 3.9, 17.2)
This seems right to me. If I optimize and set
theta_unc* = ARGMAX_theta_unc p(theta_unc | 3.9, 17.2)
isn't this the same as optimizing with respect to the model:
theta* = ARGMAX_theta beta(theta | 3.9, 17.2)
=?? inv_logit(theta_unc*)
The problem is that, in your example theta* != inv_logit(theta_unc*) (in general).
Consider it analytically. If you look for zeros of the gradient of the log probability you get an extra term due to the Jacobian. I.E., if
lpt(x_unc) = lp(x(x_unc)) + log_det_J(x_unc)
is the log probability of the unconstrained parameter and lp(x) is the constrained log probability
dlp/dx_unc = dlp/dx * dx/dx_unc + dlog_det_J/dx_unc
The point where dlp/dx_unc = 0 is not (necessarily) the point where dlp/dx = 0 .
On 6/9/13 9:43 PM, Marcus Brubaker wrote:
The problem is that, in your example theta* != inv_logit(theta_unc*) (in general).
Consider it analytically. If you look for zeros of the gradient of the log probability you get an extra term due to the Jacobian. I.E., if
lpt(x_unc) = lp((x_unc)) + log_det_J(x_unc)
OK, I see the problem now after working through it on my own in a concrete one-dimensional case.
It means our current approach to optimization is not optimizing the model the users are expecting.
The change is doable. The transform functions are already in place.
We don't want to incur runtime overhead or lock in decisions when the model is compiled. That leaves two options I can see,
I think option 1 would be more elegant. Also, relating to #110, once we have an increment_log_prob() function we could turn it off for calls that are in transformed parameters blocks so that people can express that some lp__ increments are part of the model and others are part of the parameterization.
When doing optimization, the log determinant terms that result from transformed parameters are unnecessary and will actually move the mode. We need to be able to turn off these terms when computing log_prob() and grad_log_prob() for optimization methods.