nateraw / Lda2vec-Tensorflow

Tensorflow 1.5 implementation of Chris Moody's Lda2vec, adapted from @meereeum
MIT License
107 stars 40 forks source link

Improvement in Learning Rate and Topics Learned? #54

Open dbl001 opened 5 years ago

dbl001 commented 5 years ago

I have experimented with adjustments to the 'lda_loss' function: E.g. Lda2vec.py:

            normalized = tf.nn.l2_normalize(self.mixture.topic_embedding, axis=1)
            loss_lda = self.lmbda * fraction * self.prior() + (self.learning_rate*tf.reduce_sum(tf.matmul(normalized, normalized, adjoint_b = True, name="topic_matrix")))

This change to the lda-loss learning algorithm reduces the correlation between topics in the topic_embedding matrix.

Also, this NIPS paper discusses a methodology for quantifying LDA performance, specifically, by measuring: word intrusion and topic intrusion.

http://users.umiacs.umd.edu/~jbg/docs/nips2009-rtl.pdf

Please experiment and let me know what you find.

Topic Similarity Matrix after 33 Epochs: image