Closed fvalle1 closed 4 years ago
Filippo, you are completely right. This is an error when using counts=True. I write the initial code without that feature and didnt realize it would affect here. Thanks for catching this one. Do you want to do a pull-request? Otherwise I can add the fix later
Thank you for your prompt reply.
I opened the PR that fixes this.
Have a nice day!
Hi.
I'm trying to reconstruct P(w) by multiplying P(word|topic)P(topic|sample)P(sample) namely applying matrix multiplication at p_w_tw times p_tw_d.
If I understood right everything P(w) should be the frequency of the word in the corpus. Considering the corpus in your paper, this didn't happen though...
p_w_topsbm_original.pdf
I noticed that in
get_groups
method the rowsdon't take care of edges' weight. This is an issue if the graph was built with `counts=True'.
Modifying that rows to
The P(w) obtained from P(w|tw)P(tw|d)P(d) is actually the frequency
p_w_topsbm.pdf
Is there something wrong with my assumptions? I mean multiplying P(w|tw) times P(tw|d) times P(d) should give the frequency of word w, right?
The probabilities should take into account the weight of the edges, or is there some other factor that I'm missing?
Thank you
Filippo