Average Topic Coherence over several values of topn

piskvorky / gensim

Topic Modelling for Humans

GNU Lesser General Public License v2.1

15.71k stars 4.38k forks source link

With the recent improvements to coherence evaluation introduced by #1349, it should be straightforward to implement this. Simply start with the largest topn (e.g. 20 when using 20, 15, 10, and 5, as the paper did), then work down to the smallest. The computation for the largest one is guaranteed to have a set of relevant ids that is a superset of all the others, so the accumulated counts can be re-used across the other calculations.

To make this an option for CoherenceModel, one could allow an iterable of values for topn. Alternatively, it might make sense to only make this an option for the topic model top_topics method in conjunction with combining those calculations with the CoherenceModel calculations (#1128).

piskvorky / gensim

Average Topic Coherence over several values of topn #1281