mjhajharia / transforms

2 stars 1 forks source link

Include a test on log_simplex transforms? #65

Closed spinkney closed 10 months ago

spinkney commented 11 months ago

I was doing a bit of testing and for high dimensional values of the simplex the sampler often fails. A separate issue is that the dirichlet distribution with these high dimensional vectors.

For example, the probit product transform fails for dimensions > 1000 (maybe even > 800) but the log probit product doesn't.

data {
 int<lower=0> N;
 vector<lower=0>[N] alpha;
}
parameters {
  vector[N - 1] y;
}
transformed parameters {
  vector[N] log_x;
  real log_det_jacobian = 0;
  {
    real yi;
    real log_zi, log_zprod = 0, log_zprod_new;

    for (i in 1:N - 1) {
      yi = y[i];
      log_zi = std_normal_lcdf(yi | );
      log_det_jacobian += std_normal_lpdf(yi |);
      log_zprod_new = log_zprod + log_zi;
      log_det_jacobian += log_zprod;
      log_x[i] = log_diff_exp(log_zprod, log_zprod_new);
      log_zprod = log_zprod_new;
    }
    log_x[N] = log_zprod;
  }
}
model {
  target += log_det_jacobian;
 // target += dirichlet_lpdf(x | alpha);
}

Do we want to include tests with the transforms written on the log-scale?

bob-carpenter commented 11 months ago

I'm not sure what you mean by transforms written on the log scale. Is what you provided another way to genrate a simplex somewhere?

But yes, if there's a way to make the arithmetic more stable we should both do it and mention it in the paper.

spinkney commented 11 months ago

Same as the probit transform just written keeping everything on the log scale. If we want to output the simplex then need to exponentiate log_x.

bob-carpenter commented 11 months ago

So is the idea to sample a log simplex rather than a simplex? That would be useful in practice because we take the log again right away when we use a simplex in a multinomial. We have a multinomial-logit, but that could be factored a different way if we take in a log simplex (we'd skip subtracting log-sum-exp to normalize).

sethaxen commented 11 months ago

I think this makes sense. Whether we use simplex or log_simplex, the gradients on the unconstrained parameters will be mathematically identical; the only difference is that with log_simplex they have the chance to be more numerically stable. This also allows us to separate out numerical issues coming from that particular Dirichlet just being a bad thing to sample (e.g. Dirichlet(alpha=ones(10^3) * 1e-3) ) from issues due to the transformation itself.

Perhaps we could do this easily by having each model compute both x and logx for sampling, but then for evaluation we can swap out the target distribution to either be one that uses logx or one that uses x.

sethaxen commented 10 months ago

@spinkney the probit product transform seems to be missing from the paper. Can you add it to at least the appendix?

spinkney commented 10 months ago

I'll add it to the appendix as it's similar to the logit