Open cartazio opened 6 years ago
relatedly, whats the correct / recommend way to rewrite the sum over the gammas?
as written, its increment_log_prob = log_sum_exp(gamma)
should it be
a) target+= gamma
b) target+= something something gamma
c) something else?
The code is way out of date. It's in
The current marginalization over the topic (k
) for a given word (n
) where is this:
for (n in 1:N) {
real gamma[K];
for (k in 1:K)
gamma[k] <- log(theta[doc[n],k]) + log(phi[k,w[n]]);
increment_log_prob(log_sum_exp(gamma)); // likelihood
}
That can be reduced to
for (n in 1:N)
target += log_sum_exp(log(theta[doc[n]]) + to_vector(log(phi[ , w[n]])));
It'd be even better to define log_phi
in vector form and reuse for each n
. It would also be worth doing this for log_theta
if the number of words per document is greater than the total number of topics.
@cartazio: Feel free to submit a pull request.
And a warning---you can't really do Bayesian inference for LDA because of the multimodality. You'll see that you won't satisfy convergence diagnostics running in multiple chains, and not just because of label switching.
@bob-carpenter thanks! Thats super helpful.
by multi-mode you mean: there are different local optima when viewed as an optimization problem / things are nonconvex? (ie vary the priors and there will be different local optima in the posterior?). I had to google around to figure out what you meant, https://scholar.harvard.edu/files/dtingley/files/multimod.pdf seemed the most clearly expositional despite the double spaced formatting :)
is there any good reading/references on how the "variational" formulations such as Mallet/VowpalWabbit etc deal with that issue? or is it just one of those things that tends to stay hidden in folklore common knowledge?
Yes. I meant local optima by "mode".
Nobody can deal with the issue. It's computationally intractable (at least unless P = NP). Run multiple times with different randomizer, get different answers. Usually it's only used for exploratory data analysis or to generate features for something else, so the multiple answers aren't a big deal---you just choose one either randomly or with human guidance.
Some of the later literature tries to add more informative priors to guide solutions. Some of the early work by Griffiths and Steyvers tried to measure just how different the different modes were that the algorithms found with random inits.
thanks! i'll have do a bit of digging into this :) intractability is no surprise, i was slightly imagining it might be interesting to look at the topology of how tthe different inits / regions of answers connect
also what does the term label switching mean here?
On Sun, Nov 12, 2017 at 4:58 PM, Bob Carpenter notifications@github.com wrote:
Yes. I meant local optima by "mode".
Nobody can deal with the issue. It's computationally intractable (at least unless P = NP). Run multiple times with different randomizer, get different answers. Usually it's only used for exploratory data analysis or to generate features for something else, so the multiple answers aren't a big deal---you just choose one either randomly or with human guidance.
Some of the later literature tries to add more informative priors to guide solutions. Some of the early work by Griffiths and Steyvers tried to measure just how different the different modes were that the algorithms found with random inits.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/example-models/issues/125#issuecomment-343771446, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAQwgXYqgvwDq7PFaGYwSGNECWYX2CFks5s12n2gaJpZM4Qa_oo .
On Nov 12, 2017, at 11:01 PM, Carter Tazio Schonwald notifications@github.com wrote:
thanks! i'll have do a bit of digging into this :) intractability is no surprise, i was slightly imagining it might be interesting to look at the topology of how tthe different inits / regions of answers connect
I don't know of any work characterizing this, even for simpler mixtures than LDA.
Considering that would veer into topological data analysis / computational topology and likely be #P hard, I’m not surprised. :)
What’s the relable stuff you mentioned ?
On Mon, Nov 13, 2017 at 2:17 PM Bob Carpenter notifications@github.com wrote:
On Nov 12, 2017, at 11:01 PM, Carter Tazio Schonwald < notifications@github.com> wrote:
thanks! i'll have do a bit of digging into this :) intractability is no surprise, i was slightly imagining it might be interesting to look at the topology of how tthe different inits / regions of answers connect
I don't know of any work characterizing this, even for simpler mixtures than LDA.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/example-models/issues/125#issuecomment-344026803, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAQwvtcayPF2aO9DPdQrO4snZvpvbxIks5s2JW8gaJpZM4Qa_oo .
For the Griffiths and Steyvers experiment on relating topics across initializations of LDA:
• Steyvers, Mark and Tom Griffiths. 2007. Probabilistic topic models. In Thomas K. Landauer, Danielle S. McNamara, Simon Dennis and Walter Kintsch (eds.), Handbook of Latent Semantic Analysis. Laurence Erlbaum.
They use a greedy empirical KL-divergence for alignment, which is crude, but useful.
hey @bob-carpenter , @lizzyagibson and I have been looking at the lda example code (its nice how it closely maps to the generative description in the LDA journal paper), and theres a few deprecation warnings related to
<-
along with the target updateincrement_log_sum_exp
expression. you may wanna update them :)thanks for the lovely examples!