Open LjessonS opened 7 years ago
Not exactly right. A biterm is defined as a pair of words co-occurring in the same text window. For example, a doc is "A B C B ", and suppose the window size=3, so their are two text windows which can generate biterms as follows:
PS: Thanks to other contributors, you can find the implementation of BTM with other language (e.g, python, julia, scala) on github :)
Hi could you please provide the link for the python implementation for BTM.
Recently, I'm interested in your idea of model data on word-pairs in a document for short texts, but I'm a bit of confused at how you count the biterm sets in BTM. You did a nice job to implement it in C++, but I'm not good at it, and feel hard to read c++ code. I wonder if counts of every word-pairs within a document is one, and the biterm vector of the whole biterm sets can be updated by calculating the word pairs from document to document. Wish you to answer my puzzle. Thank you very much!