mjhajharia / transforms

2 stars 1 forks source link

parametrization of softmax-augmented #43

Open mjhajharia opened 2 years ago

mjhajharia commented 2 years ago

@sethaxen if i remember correctly you suggested using p=1/N for the augmented softmax, I can see that the RMSE plots for that version are near straight lines or weird curves in some parametrizations and alright in some, the error isn't high or something but yeah. they come out similar to the rest when i take p=0.5 or something.

image image image

in contrast with this for p=0.5. do you have any thoughts about which values of p we should we go for in the actual paper

image
sethaxen commented 2 years ago

How is RMSE computed here?

The reason behind the choice of $p=1/N$ is that it empirically decorrelates the $y_i$ values. However, what I didn't look at is the effect it has on position and variance of the marginals. The choice of $p$ doesn't seem to impact marginal variance, but it shifts the mean by a lot, which is probably making adaptation hard. softmax_aug_pcomp_n100 Given this, I'm not surprised it's failing for large $N$.

The choice of $p=1$ seems to always center the draws around the origins regardless of $N$. In fact, increasing $N$ leaves the marginal distribution of $y_i$ completely unchanged: softmax_aug_p1_ncomp

I'm trying to work out a more principled choice of $p$ using some of the ideas in https://github.com/mjhajharia/transforms/issues/9#issuecomment-1191272380.

sethaxen commented 2 years ago

I also plan to look into @spinkney's observation in #37, which is interesting.

spinkney commented 2 years ago

In fact, the augmented simplex and the ILR are very, very similar. If I remove the Helmert matrix thing I get out this transform, except that it's parameterized nicely for HMC.

Here's the code. I'll make a pr for both and we can discuss what we want to do. Since the ILR is just a linear scaling of the input vector, I don't see how it is any different.

The below Stan model seems to work for all N > 1. The main thing is to set the base to 0 and update the log-abs-determinant for this.

data {
 int<lower=0> N;
 vector<lower=0>[N] alpha;
}
transformed data {
  real half_logN = 0.5 * log(N);
}
parameters {
 vector[N - 1] y;
}
transformed parameters {
 real<lower=0> logr = log_sum_exp(append_row(y, 0));
 simplex[N] x = exp(append_row(y, 0) - logr);
}
model {
 target += sum(y) - N * logr + half_logN;
// target += target_density_lp(x, alpha);
}
mjhajharia commented 2 years ago

thanks! makes sense, we could actually write it as augmented ilr or something like that. going by the convention of alr,ilr,cle etc

On Mon, Jul 25, 2022 at 10:25 AM Sean Pinkney @.***> wrote:

In fact, the augmented simplex and the ILR are very, very similar. If I remove the Helmert matrix thing I get out this transform, except that it's parameterized nicely for HMC.

Here's the code. I'll make a pr for both and we can discuss what we want to do. Since the ILR is just a linear scaling of the input vector, I don't see how it is any different.

The below Stan model seems to work for all N > 1. The main thing is to set the base to 0 and update the log-abs-determinant for this.

data { int N; vector[N] alpha; }transformed data { real half_logN = 0.5 log(N); }parameters { vector[N - 1] y; }transformed parameters { real logr = log_sum_exp(append_row(y, 0)); simplex[N] x = exp(append_row(y, 0) - logr); }model { target += sum(y) - N logr + half_logN;// target += target_density_lp(x, alpha); }

— Reply to this email directly, view it on GitHub https://github.com/mjhajharia/transforms/issues/43#issuecomment-1194120045, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANEZILBHCDPQXBU7JWTHBATVV2PUTANCNFSM54NH6TTA . You are receiving this because you authored the thread.Message ID: @.***>

spinkney commented 2 years ago

Actually, this is pretty funny. What I just did is the softmax parameterization just with a more efficient log-abs-det calculation.

It is equivalent to this. Let me close that PR and make a new one that updates the softmax code.

data {
 int<lower=0> N;
 vector<lower=0>[N] alpha;
}
transformed data {
  real half_logN = 0.5 * log(N);
}
parameters {
 vector[N - 1] y;
}
transformed parameters {
 simplex[N] x = softmax(append_row(y, 0));
}
model {
 target += sum(y) - N * log_sum_exp(append_row(y, 0)) + half_logN;
// target += target_density_lp(x, alpha);
}