juanrloaiza / latinamerican-philosophy-mining

Text mining philosophy journals in Latin America.
0 stars 2 forks source link

Implement dynamic topic model instead of regular LDA #7

Closed juanrloaiza closed 1 year ago

juanrloaiza commented 2 years ago

Gensim has an implementation of DTM, but it is incredibly slow. This has been reported as an issue, but no solution has been found yet. Not even changing LdaSeqModel to use LdaMulticore helps.

There is a pull request that improves this implementation, but it hasn't been merged yet:

It is therefore recommended to still use the old DTM wrapper in Gensim 3.8.3 to use the binary from Blei-lab. This requires two files:

The Gensim 3.8.3 wrapper is included in notebooks/utils, but the binary must be downloaded for each OS.

Finally, I commented the LdaSeqModel code in case the PR above gets merged soon.

https://github.com/juanrloaiza/latinamerican-philosophy-mining/blob/0c868ad03fbc7512145aa7633a1869545c62c3cb/notebooks/utils/model.py#L40-L59

juanrloaiza commented 1 year ago

We used Blei's binary. Closing.