stan-dev / stan

Stan development repository. The master branch contains the current release. The develop branch contains the latest stable development. See the Developer Process Wiki for details.
https://mc-stan.org
BSD 3-Clause "New" or "Revised" License
2.6k stars 370 forks source link

Optimizing seems to get stuck at boundaries, dependent on seed #1928

Open steven-murray opened 8 years ago

steven-murray commented 8 years ago

Hi

I'm using pystan version 2.9.0. I have a model based on a truncated incomplete gamma distribution.

When running optimizing(), the results depend very heavily on which seed I choose, and it seems to have something to do with the boundary values. That is, for one seed (121010), the optimization terminates after 2 iterations with all parameters at their boundaries (there are four parameters). I was firstly getting "Convergence detected: gradient norm is below tolerance", then when I set tol_grad=0, I got "relative gradient magnitude is below tolerance" and then when I set tol_rel_grad=0 I got "absolute change in objective function was below tolerance". I fear setting anything more to zero will mean there's nothing to terminate the iterations.

When I set the algorithm to "Newton" it works (doesn't seem to hit the boundaries). When setting to "BFGS" it has the same outcome as LBFGS.

When using seed of 1234 (without any of those options specified), the result comes out nicely.

I've verified that the lp__ values are correct at several of the iterations (modulo constant).

The LBFGS method seems to work every time when using the function from scipy.

I can list my stan code:

functions {
    /**
    * gammainc() is the upper incomplete gamma function (not regularized)
    * @param real a, shape parameter
    * @param real x, position > 0
    * @returns the non-regularized incomplete gamma
    * NOTES: uses a recursion relation to calculate values for negative a.
    */
    real gammainc(real a, real x){
      int n;
      real ap1;
      real ssum;

      if(a>=0) return gamma_q(a,x) * tgamma(a);

      ap1 <- a+1;

      //Get floor(-a)
      n<-0;
      while(n<-a){
        n <- n+1;
      }

      //Get summed part
      {
        vector[n] sums;
        for(i in 0:n-1) sums[i+1] <- pow(x,i)/tgamma(ap1+i);
        ssum <- sum(sums);
      }
      return tgamma(a)*(gamma_q(a+n,x)-pow(x,a)*exp(-x)*ssum);
    }

    /**
    * truncated_logGGD_log gives the log PDF of a variate whose exponential has a
    *    lower-truncated generalised gamma distribution.
    * @param vector y, variate (should be log10(m/Hs) in the usual TGGD)
    * @param real ymin, truncation in y-space.
    * @param real alpha, power-law slope
    * @param real beta, cut-off parameter
    */
    real truncated_logGGD_log(vector y, real ymin, real h, real alpha, real beta, real lnA){
        vector[num_elements(y)] x;
        real xmin;
        real z;

        z <- (alpha+1)/beta;
        x <- exp(log10()*y*beta);
        xmin <- exp(log10()*ymin*beta);
        return sum(lnA + log(beta) + log(log10()) + log10()*h + log10()*y*(alpha+1) - x) - exp(lnA)*pow(10,h)*gammainc(z,xmin);
    }
}

data {
    int<lower=0> N;                // number of halos
    vector<lower=0>[N] log_m;      // measured halo masses
    real<lower=0> V;               // Volume of the survey

    // CONTROLS FOR PARAMETER BOUNDS
    real<lower=0> hs_min;             // Lower bound of logHs
    real<lower=0,upper=20> hs_max;    // Upper bound of logHs
    real<lower=-2,upper=0> alpha_min; // Lower bound of alpha
    real<lower=-2,upper=0> alpha_max; // Upper bound of alpha
    real<lower=0> beta_min;           // Lower bound of beta
    real<lower=0> beta_max;           // Upper bound of beta
    real  lnA_min;                    // lower bound of lnA
    real  lnA_max;                    // upper bound of lnA
}

transformed data {
    real<lower=0> log_mmin;
    log_mmin <- min(log_m);
}

parameters {
    real<lower=hs_min,upper=hs_max> logHs;               // Characteristic halo mass
    real<lower=alpha_min,upper=alpha_max> alpha;         // Power-law slope
    real<lower=beta_min,upper=beta_max> beta;            // Cut-off parameter
    real<lower=lnA_min,upper=lnA_max> lnA;               // Normalisation
}

transformed parameters {
    real raw_lnA;
    raw_lnA <- lnA + log(V);
}

model {
    vector[N] y;
    real ymin;
    y <- log_m-logHs;
    ymin <- log_mmin-logHs;

    y ~ truncated_logGGD(ymin, logHs, alpha, beta, raw_lnA);
}

I am using data with ~2e5 variates: samples.txt

from a distribution with Expected Values:
'logHs': 14.294849118059398, 'alpha': -1.868803779752156, 'beta': 0.6976327799172357, 'lnA': -42.58691124225525

I set the bounds as logHs -- (13.5,15.5) alpha -- (-1.99,-1.7) beta -- (0.4,1.2) lnA -- (-48,-37).

and V is 45794539.9853329.

I was previously setting the bounds to be wider on each parameter, but contracting the support has not solved the problem.

I'm hoping I've provided enough information. Unfortunately I haven't tested if this happens for simpler problems -- so I don't know if it is problem-specific, or a more general problem within Stan. Since it doesn't always happen for my own problem, I would be hesitant to try to randomly find it in a simpler problem.

bob-carpenter commented 8 years ago

Thanks for reporting. It doesn't look like anyone has responded. Did the comparison with the other L-BFGS implementation use the same transformations implied by the constrained parameters?

I marked this a feature and a bug, at least until someone does some more investigating.

steven-murray commented 8 years ago

So for the other L-BFGS implementation, I believe there is no transformation performed. In that case, it seems to evaluate first p0 which you give it, and then explicitly evaluates the edges, then works its way in (sorry I'm not too well acquainted with the exact procedure).

I would love to be more helpful in this (constructing a more minimal WE and doing a proper comparison), but I haven't really got the time at the moment, as is frustratingly often the case. I'm more than happy to trial things for you on my end though.

bob-carpenter commented 8 years ago

You can try removing the bounds from the Stan program and initializing at a value in support and see if that works better. Maybe not if the problem was boundary values.

It'd be interesting to see if the Stan model would be optimizable by the other program or if it's truly a problem with our L-BFGS implementation not being robust enough.

On Jul 6, 2016, at 9:45 PM, Steven Murray notifications@github.com wrote:

So for the other L-BFGS implementation, I believe there is no transformation performed. In that case, it seems to evaluate first p0 which you give it, and then explicitly evaluates the edges, then works its way in (sorry I'm not too well acquainted with the exact procedure).

I would love to be more helpful in this (constructing a more minimal WE and doing a proper comparison), but I haven't really got the time at the moment, as is frustratingly often the case. I'm more than happy to trial things for you on my end though.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.