Bug in types? - Githubissues

Hi!

I think there may be an error regarding the vocabulary in the sampler.

If I use this code in preIteration():

        boolean DEBUG = true;
        if(DEBUG){
            int numOfZeroes = 0;
            for (int type = 0; type < numTypes; type++) {
                if(tokensPerType[type]==0) {
                    numOfZeroes += 1;
                    }
                }
            System.out.println("Types: " + tokensPerType.length + " ZeroTypes: " + numOfZeroes);
            System.out.println("Tokens for type : " + tokensPerType[0] + " and BetaSums: " + betaSum);
        }

I get the following result:

Types: 11515 ZeroTypes: 6840

So it seems like although we only have 6840 types that actually has tokens, we still have a Phi matrix of size 11515. Is this correct? Then this will affect the performance of partially collapsed samplers.

lejon / PartiallyCollapsedLDA

Bug in types? #2