kmpoon / hlta

Provides functions for hierarchical latent tree analysis on text data for hierarchical topic detection
GNU General Public License v3.0
81 stars 23 forks source link

using both N-gram and BOW #6

Closed un-lock-me closed 6 years ago

un-lock-me commented 6 years ago

Hi again,

Sorry for opening many issues, actually this is not an issue but I do not know where should I ask, and also it may be helpful for another person when reading these all issues to grasp the model better,

So my question is that according to the output you have provided in the paper you have used 1-gram. I mean all the words are just one word. so why you used the Ngram model then BOW?. I mean you just added in case someone wants to use N-gram but you have not used bigram or three-gram?

Thanks for adding my information

kmpoon commented 6 years ago

We have extended our work a little after the AAAI paper by considering also the n-grams. One limitation of latent tree models is that a word variable can appear under only one branch (topics). We mitigate this limitation by considering also n-grams so that one word can appear in multiple branches if it is a component of multiple n-grams.

You may find a brief discussion on using n-grams in the AIJ paper (section 8.2.2):

https://arxiv.org/abs/1605.06650

and the IJCAI tutorial slides (p. 117-122):

http://www.cse.ust.hk/%7Elzhang/topic/ijcai2016/

un-lock-me commented 6 years ago

Yea Thank you for the links :)