Open bob-carpenter opened 10 years ago
The current workaround requires too much index fiddling.
vector[K-1] alpha_dofm1;
...
{
vector[K] alpha;
for (k in 1:(K-1))
alpha[k] <- alpha_dofm1[k];
alpha[K] <- 0;
... use softmax(alpha) as before --- it is a K-simplex ...
}
Note that alpha
is declared in a new block to make it a local variable.
Should we perhaps add this workaround to the manual? Apart from the verbosity is there a reason that those interested in using the K-1 parameterization shouldn't start using this?
IMHO, anyone who can write a Stan model that uses softmax or categorical_logit can handle this without any problems.
This gets kindof ugly if one has matrix[K-1,P] alpha;
instead of just vector[K-1] alpha;
.
This will be much easier with the equivalent of the new append operations (see pull request #790). If K
is the number of dimensions for the simplexes, then
parameters {
matrix[N, K-1] alpha_raw;
}
model {
matrix[N, K] alpha;
alpha <- col_append(alpha_raw, rep_vector(0, N));
...
y[n] ~ categorical_logit(alpha[n]);
}
There shouldn't be any need to actually use the softmax operation directly. Of course, users might want to see the probabilities, at which case, the generated quantities could be used (if alpha
is a transformed parameter---otherwise it would need to be redefined).
generated quantities {
simplex[K] theta[N];
for (n in 1:N) theta[n] <- softamx(alpha[n]);
}
This would all be easier if we had
I am working on a multi-level varying-intercept, varying-slope categorical logit model right now. I tried the K-1 parameterization but it was painful. I take it append_col
is slated for 2.4++?
Right now I'm inclined to try identifying the model using redundant parameters. I'd much prefer the K-1 method.
I think the commit went in after 2.4. So it's in the current dev branch.
On Jul 28, 2014, at 12:56 PM, Allen Riddell notifications@github.com wrote:
I am working on a multi-level varying-intercept, varying-slope categorical logit model right now. I tried the K-1 parameterization but it was painful. I take it append_col 2.4++?
— Reply to this email directly or view it on GitHub.
yep.
On 14-07-28 03:43 PM, Bob Carpenter wrote:
I think the commit went in after 2.4. So it's in the current dev branch.
- Bob
Ok. I'll be a tester for the append_col
approach. It looks good. And I certainly prefer raw
to dofm1
.
I don't think "raw" is descriptive enough. I agree that "dof" is confusing, though, and don't have a good suggestion about what to call a new function.
On Jul 28, 2014, at 5:36 PM, Allen Riddell notifications@github.com wrote:
Ok. I'll be a tester for the append_col approach. It looks good. And I certainly prefer raw to dofm1.
— Reply to this email directly or view it on GitHub.
This seems to be working well 2.5pre. Thanks for the help.
model_code = """
data {
int<lower=1> N; // number of observations
int<lower=1> P; // number of features
int<lower=1> K; // number of target categories
int<lower=1> J; // number of groups
int<lower=1, upper=K> target[N];
int<lower=1, upper=J> group[N];
vector[P] x[N];
}
parameters {
vector[K-1] alpha_raw[J]; // group intercepts
matrix[K-1, P] beta_raw[J];
}
transformed parameters {
vector[K] alpha[J];
matrix[K, P] beta[J]; // K-1 handling
// alpha K-1 handling
for (j in 1:J) {
for (k in 1:(K-1))
alpha[j][k] <- alpha_raw[j][k];
alpha[j][K] <- 0;
}
// beta K-1 handling
for (j in 1:J) {
for (k in 1:(K-1))
beta[j][k] <- beta_raw[j][k];
beta[j] <- append_row(beta_raw[j], rep_row_vector(0, P)); // append_row is in Stan version 2.5pre
}
}
model {
for (j in 1:J)
alpha_raw[j] ~ normal(0, 5);
for (j in 1:J)
for (k in 1:(K-1))
beta_raw[j][k] ~ normal(0, 5); // beta_raw[j][k] is a 1xP row vector
for (i in 1:N)
target[i] ~ categorical_logit(alpha[group[i]] + beta[group[i]] * x[i]);
}
generated quantities {
simplex[K] theta[J];
for (j in 1:J) theta[j] <- softmax(alpha[j]);
}
"""
Add new versions of all these functions for the (K-1)-vector parameterization of softmax.
where
c(x,0)
is the R function to append 0 to the end of x.The advantage is that the degrees of freedom matches that of the simplex target, so the model's identified without priors.
The disadvantage is that the priors have to change relative to the K-value parameterization and are no longer symmetric.
dofm
? I suppose we could sayidentified
.c(x,0)
orc(0,x)
, i.e., should the 0 value be the first or the last element?