lejon / PartiallyCollapsedLDA

Implementations of various fast parallelized samplers for LDA, including Partially Collapsed LDA, Light LDA, Partially Collapsed Light LDA and a very efficient Polya-Urn LDA
26 stars 20 forks source link

Bug in types? #2

Closed MansMeg closed 8 years ago

MansMeg commented 8 years ago

Hi!

I think there may be an error regarding the vocabulary in the sampler.

If I use this code in preIteration():

        boolean DEBUG = true;
        if(DEBUG){
            int numOfZeroes = 0;
            for (int type = 0; type < numTypes; type++) {
                if(tokensPerType[type]==0) {
                    numOfZeroes += 1;
                    }
                }
            System.out.println("Types: " + tokensPerType.length + " ZeroTypes: " + numOfZeroes);
            System.out.println("Tokens for type : " + tokensPerType[0] + " and BetaSums: " + betaSum);
        }

I get the following result:

Types: 11515 ZeroTypes: 6840

So it seems like although we only have 6840 types that actually has tokens, we still have a Phi matrix of size 11515. Is this correct? Then this will affect the performance of partially collapsed samplers.

MansMeg commented 8 years ago

Sorry it was a bug in init of tokensPerType (=+ instead of +=)