piskvorky / gensim

Topic Modelling for Humans
https://radimrehurek.com/gensim
GNU Lesser General Public License v2.1
15.55k stars 4.37k forks source link

Uninitialized dictionary.id2token used in CoherenceModel #2919

Open UnfinishedArchitect opened 4 years ago

UnfinishedArchitect commented 4 years ago

Problem description

I have created multiple LdaModels and a CoherenceModel. Calling coherence_model.compare_models([lda_model_1, lda_model_2]) throws a KeyError. This is caused by the following line: https://github.com/RaRe-Technologies/gensim/blob/817cac99422a255001034203dc0720f7d0df0ce6/gensim/models/coherencemodel.py#L447

Initializing the dictionary (dictionary.id2token) beforehand fixes the problem (e.g. call dictionary[0]).

The problem could be fixed by simply replacing the line with topic = (self.dictionary[_id] for _id in topic).

mpenkov commented 3 years ago

@UnfinishedArchitect Thank you for reporting this! Could you make a PR?

zephyrzilla commented 3 years ago

@mpenkov I have submitted a PR addressing this issue. Can you have a look?

mpenkov commented 3 years ago

@surajit-techie Thank you for your contribution and your patience! I'm stretched a little thin at the moment, but I'll have a look at your PR as soon as I can.