mjhajharia / transforms

2 stars 1 forks source link

Ordered simplex #38

Open spinkney opened 2 years ago

spinkney commented 2 years ago

See https://discourse.mc-stan.org/t/ordered-simplex-constraint-transform/24102/17?u=spinkney

I'll add the Stan code for this. Martin helped a lot and this paper made it clear for me https://arxiv.org/pdf/0708.0176.pdf

bob-carpenter commented 2 years ago

Thanks. It's really remarkable how the ordering constraint works to give you order statistics. I still don't quite have my head around why that's true, but I've verified empirically. That is, we get the same distribution on alpha[1] from either:

ordered[N] alpha;
...
alpha ~ foo(theta);

and

vector[N] alpha_unordered;
...
alpha_unordered ~ foo(theta);
alpha = sort_asc(alpha_unordered);

Is the application to order statistics?

We're moving the language toward allowing composition of constraints. So rather than an ordered simplex, I think we can just compose the ordered transform with the simplex transform. I really want to get this into Stan to deal with the case of non-centered parameterizations of lognormal distributions.

spinkney commented 2 years ago

Is the application to order statistics? One application is adding an additional identification constraint on mixtures or latent factor models.

The order statistic stuff is really cool. It's useful to model information given in high/low or quartiles. Lots of data is being aggregated for use in my line of work due to privacy laws. Personally, I'm in favor and it is an interesting modeling problem.

spinkney commented 2 years ago

I don't see how we can combine the ordered and simplex transforms to get this transform and for it to be uniform because the constraints are coupled together. In that, the ordering impacts the simplex at each iteration in the loop below. Whenever I attempted to compose these two transforms together via one after another the ordered simplex transform doesn't give out a "uniform" ordered simplex (approximately uniform distance between each value). One of the end points is typically much too large.

I wonder how tfp manages to compose these two transforms together and if it's also uniform on the ordered simplex?

data {
 int<lower=0> N;
 vector<lower=0>[N] alpha;
}
transformed data {
  int Nm1 = N - 1;
}
parameters {
 vector[Nm1] y;

}
transformed parameters {
 simplex[N] x ;
 real log_abs_det = 0;

 {
   real remaining = 1; // Remaining amount to be distributed
   real lb = 0; // The minimum for the next element
   real ub = 1;

    for(i in 1:Nm1) {
      int N_prime = Nm1 + 2 - i; // Number of remaining elements
      // First constrain to [0; 1 / N_prime]
      ub = inv(N_prime);
      real xcons = ub * inv_logit(y[i]);
      log_abs_det += log(ub) + log_inv_logit(y[i]) + log1m_inv_logit(y[i]);

      // Add the lowest element log density
      log_abs_det += log(N_prime - 1) +  log(N_prime) + (N_prime - 2) * log1m(N_prime * xcons);

      x[i] = lb + remaining * xcons;
      lb = x[i];
      // We added  remaining * x_cons to each of the N_prime elements yet to be processed
      remaining -= remaining * xcons * N_prime; 
    }
    x[N] = lb + remaining;
 }

}
model {
 target += log_abs_det;
// target += target_density_lp(x, alpha);
}
bob-carpenter commented 2 years ago

Whenever I attempted to compose these two transforms together via one after another the ordered simplex transform doesn't give out a "uniform" ordered simplex

Ah, that makes sense. We assume uniform on the unconstrained scale leads to uniform simplexes, but nothing says that uniform on the ordered scale leads to uniform ordered simplexes.

Also, it's not clear where we want a vector of 0 values to initialize (mean initialization value for Stan). If I put all 0s through the ordered transform, I get $\begin{bmatrix} 0 & 1 & \cdots & N-1 \end{bmatrix}$ in $N$ dimensions. That then leads to probabilities proportional to $\begin{bmatrix} \exp(0) & \exp(1) & \cdots & \exp(N - 1) \end{bmatrix}$

spinkney commented 2 years ago

Also, it's not clear where we want a vector of 0 values to initialize (mean initialization value for Stan). If I put all 0s through the ordered transform, I get
[01⋯N−1] in N dimensions. That then leads to probabilities proportional to

Something neat about this is that the average value within $[0, N']$ for each $N' \in (1, \ldots, N)$ is $1/N'^2$. We can see that by noting that the cdf of the minimum is

$$ F(x) = \begin{cases} 1 - (1 - Nx)^{N-1} & 0 \le x \le 1/N \ 1 & 1/N \le x \le 1 \end{cases} $$

The ccdf is just 1 minus this and we can use that to calculate the mean value.

$$ \begin{aligned} E(x) &= \int{0}^{1/N} (1 - Nx)^{N-1} dx \ &= -\frac{1}{N} \int{1}^{0} u^{N-1} du \ &= -\frac{u^N}{N^2} \bigg\vert_1^0 \ &= \frac{1}{N^2} \end{aligned} $$

where I've substituted $u = 1 - Nx$ and $du = -N dx$

I can calculate an offset at each iteration s.t. the offset is equal to this average value. Is that what you mean?

data {
 int<lower=0> N;
 vector<lower=0>[N] alpha;
}
transformed data {
  real half_logN = 0.5 * log(N);
  int Nm1 = N - 1;
}
parameters {
 vector[Nm1] y;

}
transformed parameters {
 simplex[N] x ;
 real log_abs_det = 0;

 {
   real remaining = 1; // Remaining amount to be distributed
   real lb = 0; // The minimum for the next element
   real ub = 1;

    for(i in 1:Nm1) {
      int N_prime = Nm1 + 2 - i; // Number of remaining elements
      real off_set = logit(1/N_prime^2);
      //First constrain to [0; 1 / N_prime]
      ub = inv(N_prime);
      real xcons = ub * inv_logit(off_set + y[i]);
      log_abs_det += log(ub) + log_inv_logit(off_set + y[i]) + log1m_inv_logit(off_set + y[i]);

      // Add the lowest element log density
      log_abs_det += log(N_prime - 1) +  log(N_prime) + (N_prime - 2) * log1m(N_prime * xcons);

      x[i] = lb + remaining * xcons;
      lb = x[i];
      //We added  remaining * x_cons to each of the N_prime elements yet to be processed
      remaining -= remaining * xcons * N_prime; 
    }
    x[N] = lb + remaining;
 }

}
model {
 target += log_abs_det;
// target += target_density_lp(x, alpha);
}
spinkney commented 2 years ago

Another way to achieve the same thing is the "augmented approach". The proof is at https://mathoverflow.net/a/312594.

data {
 int<lower=0> N;
 // vector<lower=0>[N] alpha;
}
transformed data {
  vector[N] lsv = 1 ./ reverse(linspaced_vector(N, 1, N));
}
parameters {
 vector<lower=0>[N] a;
}
transformed parameters {
 vector[N] x;
 real a_sum = sum(a);

 for (i in 1:N) {
   x[N - i + 1] =  dot_product(a[1:N - i + 1], lsv[1:N - i + 1]);
 }
 x /= a_sum;
}
model {
 target += -a_sum; // a ~ exponential(1)
}
generated quantities {
  real x_sum = sum(x);
}
spinkney commented 1 year ago

My notes: The spacings of a uniform ordered simplex follow

$$ s = \begin{bmatrix} \frac{1}{N(N)}, \frac{1}{N(N - 1)}, \cdots, \frac{1}{N} \end{bmatrix} $$

the expected values of the uniform simplex are then

$$ \operatorname{E}(x) = \begin{bmatrix} s1, \sum\limits{i = 1}^2 si, \cdots, \sum\limits{i = 1}^N s_i \end{bmatrix} $$

Let's see if it works. In R

spacings <- function(N) {
  nums <- 0:(N - 1)
  s <- 1/(N * (N - nums))
  data.frame(vals = cumsum(s), spacings = s) 
 }
spacings(5)
> spacings(5)
       vals   spacings
1 0.0400000 0.04000000
2 0.0900000 0.05000000
3 0.1566667 0.06666667
4 0.2566667 0.10000000
5 0.4566667 0.20000000

In Stan

# A tibble: 5 × 10
  variable   mean median     sd    mad      q5   q95  rhat ess_bulk ess_tail
  <chr>     <dbl>  <dbl>  <dbl>  <dbl>   <dbl> <dbl> <dbl>    <dbl>    <dbl>
1 x[1]     0.0399 0.0315 0.0327 0.0306 0.00248 0.106  1.00   45093.   28310.
2 x[2]     0.0900 0.0867 0.0454 0.0502 0.0217  0.170  1.00   45450.   32464.
3 x[3]     0.157  0.159  0.0550 0.0589 0.0640  0.245  1.00   42800.   31539.
4 x[4]     0.256  0.257  0.0670 0.0647 0.143   0.367  1.00   40390.   32386.
5 x[5]     0.457  0.437  0.118  0.115  0.298   0.685  1.00   43653.   32776.