gregversteeg / corex_topic

Hierarchical unsupervised and semi-supervised topic models for sparse count data with CorEx
Apache License 2.0
627 stars 120 forks source link

How to do word cloud or frequency distribution on each topic? #22

Closed JeevaGanesan closed 5 years ago

JeevaGanesan commented 5 years ago

First of all thanks for the wonderful work. It works perfectly, I got my topics with right anchor words. Everything is working fine, however I want to see the word cloud or frequency distribution of each topic. How can do that? Thanks in advance.

ryanjgallagher commented 5 years ago

Hi @JeevaGanesan, glad that everything has been working well for you.

If you want to see which words have the highest mutual information with each topic, then you can use the get_topics() function to get the topic words and their mutual informations. Those could then be used with matplotlib to do a bar graph of the top words in the topic. Is that what you are looking for?

JeevaGanesan commented 5 years ago

Yup, i tried this,

import matplotlib.pyplot as plt
from wordcloud import WordCloud
topics = topic_model.get_topics()
for n,topic in enumerate(topics):
    plt.figure()
    plt.imshow(WordCloud().fit_words(topics[n]))
    plt.axis("off")
    plt.title("Topic #" + str(t))
    plt.show()

But i get this error -

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-41-b1e5dc997973> in <module>()
      3 for n,topic in enumerate(topics):
      4     plt.figure()
----> 5     plt.imshow(WordCloud().fit_words(topics[n]))
      6     plt.axis("off")
      7     plt.title("Topic #" + str(t))

C:\ProgramData\Anaconda3\lib\site-packages\wordcloud\wordcloud.py in fit_words(self, frequencies)
    359         self
    360         """
--> 361         return self.generate_from_frequencies(frequencies)
    362 
    363     def generate_from_frequencies(self, frequencies, max_font_size=None):  # noqa: C901

C:\ProgramData\Anaconda3\lib\site-packages\wordcloud\wordcloud.py in generate_from_frequencies(self, frequencies, max_font_size)
    378         """
    379         # make sure frequencies are sorted and normalized
--> 380         frequencies = sorted(frequencies.items(), key=itemgetter(1), reverse=True)
    381         if len(frequencies) <= 0:
    382             raise ValueError("We need at least 1 word to plot a word cloud, "

AttributeError: 'list' object has no attribute 'items'

<Figure size 432x288 with 0 Axes>

And I am not sure how we can use mutual information to make the bar graph. I usually create the frequency distribution graph with word and the counts. Can you please help with that?

JeevaGanesan commented 5 years ago

Sorry for making a mess here. I managed to make word clouds, here is the code if you want to incorporate.

from wordcloud import WordCloud
topics = topic_model.get_topics()
def make_wordcloud(topics):
    for n,topic in enumerate(topics):
        terms = dict(topics[n])
        wordcloud = WordCloud().generate_from_frequencies(terms)
        plt.figure()
        plt.imshow(wordcloud)
        plt.axis("off")
        plt.title("Topic #" + str(n))
        plt.show()

make_wordcloud(topics) 

Thanks for your suggestion, you can close this issue. Have a good day.

ryanjgallagher commented 5 years ago

Glad you got it figured out!