juanrloaiza / latinamerican-philosophy-mining

Text mining philosophy journals in Latin America.
0 stars 2 forks source link

Implement printing topic words by probability mass #6

Closed juanrloaiza closed 1 year ago

juanrloaiza commented 2 years ago

Following Allen & Murdock (2020):

Second, by showing only the ten or so highest-weight words for each topic, such presentations neglect most of the words that contribute to the topics’ roles in representing the corpus documents. For example, in the 200-topic model that we constructed from 665 non-fiction English-language books read by Charles Darwin between 1837 and 1860 (Murdock, Allen, and DeDeo 2017), typically 500-600 words are required to account for 50% of the probability mass for any given topic. Looking only at the first ten or twenty words may provide little understanding of why that topic has been assigned a high weight for a given document.