Open rtrad89 opened 4 years ago
During topic learning, one needs to supply
W: int, size of vocabulary
.I tried to fathom the meaning of
W
reading Algorithm 1: Gibbs sampling algorithm for BTM in the paper BTM: Topic Modeling over Short Texts, butW
is not an input there. However, it is data-dependent to me, so am I correct if I assumeW
to mean the number of unique terms in the cleaned and preprocessed corpus? If so, any reasonW
is not calculated from the corpusdocs_pt
automatically? I'm afraid I am missing something hence my question.Thank you.
W
denotes the vocab size.
W=`wc -l < $voca_pt` # vocabulary size
During topic learning, one needs to supply
W: int, size of vocabulary
. I tried to fathom the meaning ofW
reading Algorithm 1: Gibbs sampling algorithm for BTM in the paper BTM: Topic Modeling over Short Texts, butW
is not an input there. However, it is data-dependent to me, so am I correct if I assumeW
to mean the number of unique terms in the cleaned and preprocessed corpus? If so, any reasonW
is not calculated from the corpusdocs_pt
automatically? I'm afraid I am missing something hence my question. Thank you.
W
denotes the vocab size.W=`wc -l < $voca_pt` # vocabulary size
Can you clarify this then please?
If so, any reason W is not calculated from the corpus docs_pt automatically? I'm afraid I am missing something hence my question.
$voca_pt is the vocab file automatically calculated from $doc_pt. See
python indexDocs.py $doc_pt $dwid_pt $voca_pt
During topic learning, one needs to supply
W: int, size of vocabulary
.I tried to fathom the meaning of
W
reading Algorithm 1: Gibbs sampling algorithm for BTM in the paper BTM: Topic Modeling over Short Texts, butW
is not an input there. However, it is data-dependent to me, so am I correct if I assumeW
to mean the number of unique terms in the cleaned and preprocessed corpus? If so, any reasonW
is not calculated from the corpusdocs_pt
automatically? I'm afraid I am missing something hence my question.Thank you.