mjhajharia / transforms

2 stars 1 forks source link

Inverse isometric log ratio simplex method #39

Open spinkney opened 2 years ago

spinkney commented 2 years ago

I can't find where @bob-carpenter mentioned that he was still trying to understand this transform but it piqued my curiosity because I hadn't heard of it.

After grokking the idea I was able to put together an R script that generates a simplex using it. The idea connects unit vector and simplex stuff together. The following is the reverse of the ILR so it's a bit weird. Needing to invert the Helmert matrix - which enforces the the sum-to-zero constraint in the forward mapping.

make_V <- function(D) {
  apply(t(contr.helmert(D)), 1, function(x) x /norm(x))
}

clr_trans <- function(x) {
  D <- length(x)
  log(x) - sum(log(x)) / D
}

random_simplex <- function(D) {

  x_test_ilr <- runif(D - 1, -5, 5)
  # To reverse transform 
  # append a row with all 0s except the last as 1 to make V matrix invertible
  V <- make_V(D)
  V_fullrank <- rbind(t(V), c(rep(0, D - 1), 1))
  V_inv_fullrank <- solve(V_fullrank)

  # append 0 on the ILR values
  x_test_ilr_zero <- c(x_test_ilr, 0)

  exp_s <- exp(V_inv_fullrank %*% x_test_ilr_zero)
  exp_s

  alpha <- sum(exp_s[1:D-1])
  Z <- -log1p(alpha)

  # values match x_test
  return(exp_s * exp(Z))

}
random_simplex(5)
         [,1]
1 0.020302134
2 0.039406348
3 0.704896030
4 0.005134803
5 0.230260686

I've written a Stan program without the log-det-jac adjustment because I'm tired and I think @sethaxen may be able to whip it up much quicker than me. I put a normal prior just to get sampling going.

data {
 int<lower=0> N;
 matrix[N - 1 , N - 1] Vinv;
}
parameters {
 vector[N - 1] y;
}
transformed parameters {
  simplex[N] z;
  {
    vector[N - 1] s = Vinv * y;
    real alpha = log_sum_exp(append_row(s, 0));
    z = append_row(exp(s - alpha), 1 / exp(alpha));
  }

}
model {
 y ~ normal(0, 10);
}
generated quantities {
  real z_sum = sum(z);
}

To run this you need

make_V <- function(D) {
  apply(t(contr.helmert(D)), 1, function(x) x /norm(x))
}

V <- make_V(N)
V_fullrank <- rbind(t(V), c(rep(0, N - 1), 1))
V_inv_fullrank <- solve(V_fullrank)

mod_out <- mod$sample(
  data = list(N = N,
              Vinv = matrix(V_inv_fullrank[1:(N-1), 1:(N-1)], N-1, N-1)),
  parallel_chains = 4,
  iter_sampling = 2000
)
mod_out$summary("z")

and example output of


 mod_out$summary("z")
# A tibble: 5 × 10
  variable  mean    median    sd       mad       q5   q95  rhat ess_bulk ess_tail
  <chr>    <dbl>     <dbl> <dbl>     <dbl>    <dbl> <dbl> <dbl>    <dbl>    <dbl>
1 z[1]     0.201 0.0000368 0.371 0.0000546 6.36e-14  1.00  1.00    8449.    6023.
2 z[2]     0.206 0.0000382 0.374 0.0000567 3.85e-14  1.00  1.00    9083.    6334.
3 z[3]     0.199 0.0000351 0.369 0.0000521 4.16e-14  1.00  1.00    8053.    6443.
4 z[4]     0.194 0.0000241 0.365 0.0000357 4.58e-14  1.00  1.00    7887.    6253.
5 z[5]     0.201 0.0000500 0.371 0.0000742 3.62e-14  1.00  1.00    7808.    6525.
spinkney commented 2 years ago

Now that I look at it again, it looks like the softmax formulation but using s instead of y. So just

data {
 int<lower=0> N;
 matrix[N - 1 , N - 1] Vinv;
}
parameters {
 vector[N - 1] y;
}
transformed parameters {
  simplex[N] z;
  vector[N - 1] s = Vinv * y;

  {
    real alpha = log_sum_exp(append_row(s, 0));
    z = append_row(exp(s - alpha), 1 / exp(alpha));
  }
}
model {
 target += -N * log1p_exp(log_sum_exp(s)) + sum(s);
}
generated quantities {
  real z_sum = sum(z);
}
sethaxen commented 2 years ago

Is there an explanation of the reasoning behind this transformation somewhere?

I've written a Stan program without the log-det-jac adjustment because I'm tired and I think @sethaxen may be able to whip it up much quicker than me.

Happy to! :slightly_smiling_face:

spinkney commented 2 years ago

I was following this https://www.researchgate.net/publication/266864217_AN_ALGEBRAIC_METHOD_TO_COMPUTE_ISOMETRIC_LOGRATIO_TRANSFORMATION_AND_BACK_TRANSFORMATION_OF_COMPOSITIONAL_DATA.

It isn't a great paper but the calculations were confirmed in the compositions R package.

Here's my noob julia code to get the Jacobian. It is not equal to what I have in the Stan program above.

using Test, LinearAlgebra, ForwardDiff, StatsFuns, StatsModels

D = 5
helmert_mat = transpose(StatsModels.ContrastsMatrix(HelmertCoding(), 1:D).matrix)

V_raw = eachrow(helmert_mat) ./ norm.(eachrow(helmert_mat))
V_raw = transpose(reduce(hcat, V_raw))

V_fullrank = vcat(V_raw, vcat(fill(0, D - 1), 1)')
V_inv_fullrank = inv(V_fullrank)

y = rand(Uniform(-10, 10), D - 1)

function euclid_to_simplex!(y, Vinv) 
    s = Vinv * y
    alpha = log(sum(exp.(vcat(s,0 ))))
    return vcat(exp.(s .- alpha), 1/exp(alpha))
end

J = ForwardDiff.jacobian(y -> euclid_to_simplex!(y, V_inv_fullrank[1:D-1,1:D-1]), y)
logabsdet(J'J)[1]/2

# does not equal this
s_test = V_inv_fullrank[1:D-1,1:D-1]  * y
-D * log1pexp(logsumexp(s_test)) + sum(s_test)
spinkney commented 2 years ago

I did get the Jacobian by plugging exp(V*y - vector(log(sum(exp(V*y))))) into matrixcalc.

There's gotta be a simpler formula though

data {
 int<lower=0> N;
 matrix[N - 1 , N - 1] Vinv;
 matrix[N, N] Vinv_full;
}
parameters {
 vector[N - 1] y;
}
transformed parameters {
  simplex[N] z;
  vector[N - 1] s = Vinv * y;
  real logabsdet = 0;

  {
    real alpha = log_sum_exp(append_row(s, 0));
    z = append_row(exp(s - alpha), 1 / exp(alpha));

    vector[N] t0 = exp(Vinv_full * append_row(y, 0));
    real t1 = sum(t0);
    vector[N] t2 = t0 / t1;

  //  matrix[N, N] J = diag_pre_multiply(t2, Vinv_full) - (1/t1) * t2 * (t0' * Vinv_full);
 // a bit faster
   matrix[N, N - 1] J = add_diag( -(1/t1) * t2 * t0', t2) * Vinv_full[:, 1:N - 1];
    logabsdet += log_determinant_spd(crossprod(J[,1:N-1])) * 0.5;
  }

}
model {
 target += logabsdet;
}
generated quantities {
  real z_sum = sum(z);
}

Test of jacobian in Julia

```julia using Test, LinearAlgebra, ForwardDiff, StatsFuns, StatsModels D = 5 helmert_mat = transpose(StatsModels.ContrastsMatrix(HelmertCoding(), 1:D).matrix) V_raw = eachrow(helmert_mat) ./ norm.(eachrow(helmert_mat)) V_raw = transpose(reduce(hcat, V_raw)) V_fullrank = vcat(V_raw, vcat(fill(0, D - 1), 1)') V_inv_fullrank = inv(V_fullrank) y = rand(Uniform(-10, 10), D - 1) s = V_inv_fullrank[1:D-1,1:D-1] * y z = vcat(exp.(s .- alpha), 1/exp(alpha)) function euclide_to_simplex!(y, Vinv) s = Vinv * y alpha = log(sum(exp.(vcat(s,0 )))) return vcat(exp.(s .- alpha), 1/exp(alpha)) end J = ForwardDiff.jacobian(y -> euclide_to_simplex!(y, V_inv_fullrank[1:D-1,1:D-1]), y) t0 = V_inv_fullrank * vcat(y, 0) t1 = exp.(t0) t2 = sum(t1) t3 = exp.(t0 .- log(t2) ) J_test = diagm(t3) * V_inv_fullrank - (1/t2) * t3 * (transpose(t1) * V_inv_fullrank) @test logabsdet(J_test[:, 1:D-1]'J_test[:, 1:D-1])[1] * 0.5 ≈ logabsdet(J'J)[1]/2 Test Passed Expression: (logabsdet((J_test[:, 1:D - 1])' * J_test[:, 1:D - 1]))[1] * 0.5 ≈ (logabsdet(J' * J))[1] / 2 Evaluated: -24.100049227510453 ≈ -24.100049227559065 ```
bob-carpenter commented 2 years ago

Two things should help.

$$ \det(A \ B) = (\det A \ \det B) \qquad \det(A^{\top}) = \det(A) $$

That means

$$ \log \det J J^{\top} = \log \det J + \log \det J^{\top} = 2 \log \det J $$

so we can avoid the cross-product. And then $J$ itself is a product that can be decomposed the same way with a component that can be precomputed as transformed data (Vinv_full), so we are left with needing a determinant for

$$ \textrm{diag}(t_2) + \frac{-t_2}{t_1} \ t_0^{\top} $$

Now we can apply the matrix determinant lemma

$$ \det (A + uv^{\top}) = (1 + v^{\top} A^{-1} u) \ \det(A) $$

with $A = \textrm{diag}(t_2)$, $u = \frac{-t_2}{t_1}$, and $v = t_0$.

I just learned this matrix determinant lemma last week and it's cropping up everywhere! Of course, I could've botched the linear algebra, so please double check!

spinkney commented 2 years ago

I got it!!

By chance I happened to find a ILR cheatsheet. Wow! So easy to take the derivative. I confirmed with the Julia code.

$$ \log |J| = \log\text{softmax}(Vy)+ \log(D) $$

data {
 int<lower=0> N;
 matrix[N - 1 , N - 1] Vinv;
 matrix[N, N] Vinv_full;
}
parameters {
 vector[N - 1] y;
}
transformed parameters {
  simplex[N] z;
  vector[N - 1] s = Vinv * y;
  real logabsdet = sum(log_softmax( Vinv_full[:, 1:N-1] * y)) + log(N);

  {
    real alpha = log_sum_exp(append_row(s, 0));
    z = append_row(exp(s - alpha), 1 / exp(alpha));
  }

}
model {
 target += logabsdet;
}
generated quantities {
  real z_sum = sum(z);
}
spinkney commented 2 years ago

The code can be simplified quite a bit. Where I've updated Vinv to be N x N -1. The adjustment is just the sum of the log simplex and the log of the dimension of the simplex.

data {
 int<lower=0> N;
 matrix[N, N - 1] Vinv;
}
transformed data {
  real logN = log(N);
}
parameters {
 vector[N - 1] y;
}
transformed parameters {
  vector[N] s = Vinv * y;
  real alpha = log_sum_exp(s);
  simplex[N] z = exp(s - alpha);
}
model {
target += sum(s - alpha) + logN;
}
generated quantities {
  real z_sum = sum(z);
}
mjhajharia commented 2 years ago

yes that cheatsheet was super useful i came across it last week only!!!!

On Fri, Jul 22, 2022 at 3:16 PM Sean Pinkney @.***> wrote:

I got it!!

By chance I happened to find a ILR cheatsheet http://www.sediment.uni-goettingen.de/staff/tolosana/extra/CoDaNutshell.pdf. Wow! So easy to take the derivative. I confirmed with the Julia code.

$$ \log |J| = \log\text{softmax}(V^\top y)+ \log(D) $$

data { int N; matrix[N - 1 , N - 1] Vinv; //matrix[N, N] Vinv_full; }parameters { vector[N - 1] y; }transformed parameters { simplex[N] z; vector[N - 1] s = Vinv y; real logabsdet = sum(log_softmax( Vinv_full[:, 1:N-1] y)) + log(N);

{ real alpha = log_sum_exp(append_row(s, 0)); z = append_row(exp(s - alpha), 1 / exp(alpha)); }

}model { target += logabsdet; }generated quantities { real z_sum = sum(z); }

— Reply to this email directly, view it on GitHub https://github.com/mjhajharia/transforms/issues/39#issuecomment-1192865709, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANEZILHDJK3PCYIJUI533O3VVLXPPANCNFSM54JQKIWQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

spinkney commented 2 years ago

@sethaxen I'm seeing discrepancies from ForwardDiff and this log-abs-det that I'm not sure are due to numerical computation errors or I'm missing something on this transform. If I make the dimension size 10000 the difference is about 2 which seems much too high. Is this a typical observation or does it indicate I'm still missing something. The differences are very small for low dimension sizes.

spinkney commented 2 years ago

Ok, it's floating point errors. If I restrict the domain of y to 0,1 or -1, 1 or something the differences disappear. I'm guessing it's the final exp.

spinkney commented 2 years ago

I haven't had this much fun doing math in a long time. What I have is correct and it follows from the matrix determinant lemma that @bob-carpenter gave (thanks a ton for that hint)! It's quite elegant that the log-determinant of the matrix calculus formula boils down to this simple update. I'll update the paper with the math and make a pull request for the stan code over the next few days.

bob-carpenter commented 2 years ago

I haven't had this much fun doing math in a long time.

Algebra is a lot of fun when a big pile of math like a log Jacobian determinant works out to something simple. I'm hoping we can convey that to the reader while making all this understandable.

spinkney commented 2 years ago

After going through all the math this is just the softmax transform with a linear scaling applied.

spinkney commented 2 years ago

I'm opening this back up because - although I don't understand why - the ESS is 10x higher for the last parameter in this transform vs the softmax.

bob-carpenter commented 2 years ago

the ESS is 10x higher for the last parameter in this transform vs the softmax.

Is that softmax with the last element pinned to zero?

spinkney commented 2 years ago

Yes, softmax with the last value pinned to zero

bob-carpenter commented 2 years ago

Perhaps that's not surprising when you look at the Jacobian, which is all about how much probability is left over for the last element and then raised to dimensionality power.