stan-dev / stan

Stan development repository. The master branch contains the current release. The develop branch contains the latest stable development. See the Developer Process Wiki for details.
https://mc-stan.org
BSD 3-Clause "New" or "Revised" License
2.56k stars 365 forks source link

optimization fails for a constrained parameter #2511

Open UnkindPartition opened 6 years ago

UnkindPartition commented 6 years ago

Consider the following simple model, which should return the average of two positive numbers:

data {
  vector<lower=0>[2] X;
  real<lower=0> sigma;
}
parameters {
  real<lower=0> mu;
}
model {
  X ~ normal(mu, sigma);
}

R code:

avg_model <- stan_model("avg.stan")
optimizing(avg_model, list(X=c(0.3, 0.5), sigma=1e-4), init=0)$par

If sigma is not too low (1 or 0.1), it works fine. It also works fine if I remove the lower bound on mu. But if the lower bound is present and sigma is low (1e-3, 1e-4), optimizing() almost always returns mu = 0.

The manually transformed model has the same behavior:

data {
  vector<lower=0>[2] X;
  real<lower=0> sigma;
}
parameters {
  real log_mu;
}
transformed parameters {
  real mu = exp(log_mu);
}
model {
  X ~ normal(mu, sigma);
  target += log_mu;
}

Yet the likelihood looks fine, and R's optim() has no problem maximizing it despite using the approximate gradient.

liky <- Vectorize(function(log_mu) {
  X <- c(0.3,0.5)
  sum(dnorm(X, mean=exp(log_mu) , sd=1e-4, log=T)) + log_mu
})
curve(liky(x), from=-10, to=-0.2)
optim(0, liky, control = list(fnscale=-1))$par

liky

rstan version 2.16.2

bob-carpenter commented 6 years ago

For optimizing, the Jacobian is turned off; it's on for sampling.

Thus your two programs will give the same answer for sampling, but different answers for optimization.

As far as solution tolerances, they're on the unconstrained scale. So that makes a difference for values near zero.

UnkindPartition commented 6 years ago

Thanks for the response Bob, that's very useful to know.

However, the two programs do give the same (incorrect) answer for optimization.

Also, I just checked that R's optim finds the correct answer both with and without the Jacobian.

bob-carpenter commented 6 years ago

Right. Didn't mean to indicate this explained what was going on. I just wanted to clarify what the answer is supposed to be.

bob-carpenter commented 6 years ago

I verified that it's indeed providing the wrong answer with:

data {
  vector<lower=0>[2] X;
  real<lower=0> sigma;
}
parameters {
  real<lower=0> mu;
}
model {
  X ~ normal(mu, sigma);
}

The output I get is this:

> mod <- stan_model("avg.stan")
> mle <- optimizing(mod, list(X=c(0.3, 0.5), sigma=1e-4), init=0)
Initial log joint probability = -3.7e+07
Optimization terminated normally: 
  Convergence detected: gradient norm is below tolerance
> mle$par
mu 
 0 

This is a very very badly conditioned model for an init of 0 on the unconstrained scale, which is an init of 1 on the constrained scale. That puts it at (1 - .4) / 1e-4 = 6000 standard deviations away from the posterior mean. So I'm not sure if this is going to be fixable in the sense of being able to modify the algorithm to produce the right answer from such a bad init.

Returning the wrong answer is still a bug. I would be happy if we could at least diagnose the error.

If you get closer to the right answer on init, say just 3000 standard deviations away instead of 6000 standard deviations, then everything converges very reliably, as in:

> mle <- optimizing(mod, list(X=c(0.3, 0.5), sigma=1e-4), init=list(mu = 0.1))

> mle$par
 mu 
0.4 

and same for

> mle <- optimizing(mod, list(X=c(0.3, 0.5), sigma=1e-4), init=list(mu = 0.7))
bob-carpenter commented 6 years ago

This works if you turn off all the tolerances other than the values themselves.

> mle <- optimizing(mod, list(X=c(0.3, 0.5), sigma=1e-4), tol_grad=-1, tol_rel_grad=-1, tol_obj=-1, tol_rel_obj=-1)
Initial log joint probability = -4.75677e+06
Exception: normal_lpdf: Location parameter is inf, but must be finite!  (in 'model6ae246d29e15_avg' at line 9)

Exception: normal_lpdf: Location parameter is inf, but must be finite!  (in 'model6ae246d29e15_avg' at line 9)

Exception: normal_lpdf: Location parameter is inf, but must be finite!  (in 'model6ae246d29e15_avg' at line 9)

Exception: normal_lpdf: Location parameter is inf, but must be finite!  (in 'model6ae246d29e15_avg' at line 9)

Error evaluating model log probability: Non-finite gradient.

Optimization terminated normally: 
  Convergence detected: absolute parameter change was below tolerance

> mle$par
 mu 
0.4 

You can see it struggling with arithmetic along the way.

Setting these tolerances to 0 rather than -1, the algorithm detects convergence because you're so far out in the tails that the gradients must all be close to zero. I'm not quite sure why the releative change in objective isn't OK.

UnkindPartition commented 6 years ago

Unfortunately, I can't seem to reproduce your results.

If I don't supply init as in your comment https://github.com/stan-dev/stan/issues/2511#issuecomment-378731571, then regardless of the tolerances it arrives at the right answer about half of the time, see https://ro-che.info/files/2018-04-03-stan-optimization-issue/2018-04-10-avg.html.

If I do supply init=0, as in my original comment, then it never gets it right despite the turned off tolerances.