Following the provided notebook, I have been trying to use hlda to infer topics on a large set (~100,000 docs) of short text docs with vocab size of 15000. The sampling is very slow, took about 11 hours for 10 iterations (n_samples = 10).
From my results as well as your demo It seems level-0 only has one topic which contains all docs. It makes sense since level-0 is at the top of the hierarchy. But I still want to confirm that if I want to have 4 levels of topics with each level containing different topic/cluster assignments, I should setnum_levels = 5?
Finally, may I ask how to (or if there is any intuition I can use ) choose values for alpha and gamma? Especially for inferring large set of short text docs?
Thanks for your great work Joe!
Following the provided notebook, I have been trying to use hlda to infer topics on a large set (~100,000 docs) of short text docs with vocab size of 15000. The sampling is very slow, took about 11 hours for 10 iterations (
n_samples = 10
).From my results as well as your demo It seems level-0 only has one topic which contains all docs. It makes sense since level-0 is at the top of the hierarchy. But I still want to confirm that if I want to have 4 levels of topics with each level containing different topic/cluster assignments, I should set
num_levels = 5
?Finally, may I ask how to (or if there is any intuition I can use ) choose values for
alpha
andgamma
? Especially for inferring large set of short text docs?Thanks again.