Implementations of various fast parallelized samplers for LDA, including Partially Collapsed LDA, Light LDA, Partially Collapsed Light LDA and a very efficient Polya-Urn LDA
I think there may be an error regarding the vocabulary in the sampler.
If I use this code in preIteration():
boolean DEBUG = true;
if(DEBUG){
int numOfZeroes = 0;
for (int type = 0; type < numTypes; type++) {
if(tokensPerType[type]==0) {
numOfZeroes += 1;
}
}
System.out.println("Types: " + tokensPerType.length + " ZeroTypes: " + numOfZeroes);
System.out.println("Tokens for type : " + tokensPerType[0] + " and BetaSums: " + betaSum);
}
I get the following result:
Types: 11515 ZeroTypes: 6840
So it seems like although we only have 6840 types that actually has tokens, we still have a Phi matrix of size 11515. Is this correct? Then this will affect the performance of partially collapsed samplers.
Hi!
I think there may be an error regarding the vocabulary in the sampler.
If I use this code in preIteration():
I get the following result:
So it seems like although we only have 6840 types that actually has tokens, we still have a Phi matrix of size 11515. Is this correct? Then this will affect the performance of partially collapsed samplers.