piskvorky / gensim

Topic Modelling for Humans
https://radimrehurek.com/gensim
GNU Lesser General Public License v2.1
15.55k stars 4.37k forks source link

Coherence Score Nan's Gensim LDA #3132

Open T0admomo opened 3 years ago

T0admomo commented 3 years ago

Hello, I am working on my first topic modeling project with the gensim library. I am having an issue where the coherence score only returns a NAN,

model

`lda_model = gensim.models.ldamodel.LdaModel(corpus = corpus, id2word= id2word, num_topics = 3, random_state = 100, update_every = 1, chunksize = 500,# num of texts used per train alpha = 'auto', per_word_topics = True, passes = 10 )

from gensim.models import CoherenceModel

Compute Coherence Score

coherence_model_lda = CoherenceModel(model=lda_model, texts=lemmatized_posts, dictionary=id2word, coherence='c_v') coherence_lda = coherence_model_lda.get_coherence() print('\nCoherence Score: ', coherence_lda) `

I've been struggling with this for awhile, still have a lot to do on my project, and topic modeling is only step 1! I really appreciate the work you all have put in to this wonderful toolkit and hope that I can get some help with this issue!

the error code i get read's

/home/t0ad/anaconda3/envs/work/lib/python3.8/site-packages/gensim/topic_coherence/direct_confirmation_measure.py:202: RuntimeWarning: invalid value encountered in true_divide numerator = (co_occur_count / num_docs) + EPSILON /home/t0ad/anaconda3/envs/work/lib/python3.8/site-packages/gensim/topic_coherence/direct_confirmation_measure.py:203: RuntimeWarning: invalid value encountered in true_divide denominator = (w_prime_count / num_docs) * (w_star_count / num_docs) /home/t0ad/anaconda3/envs/work/lib/python3.8/site-packages/gensim/topic_coherence/direct_confirmation_measure.py:198: RuntimeWarning: invalid value encountered in true_divide co_doc_prob = co_occur_count / num_docs

I am unable to find a previous issue with this set of errors. Thank you for your time.

steven-solomon commented 1 year ago

@T0admomo, can you share the output of the get_coherence_per_topic? It is likely the case that one of the scores for an individual topic is nan.