Closed miguelgondu closed 1 year ago
One way is to do it is to sort by likelihood. If we assume a uniform prior over documents, we can essentially normalize and account for 50% of the "likelihood mass".
TODO: Depending on the results of the gridsearch, we can finally decide on whether this is a good criteria or not.
We decided on likelihood mass (or something along those lines which Miguel know's about). Closing.
How do we define if a document belongs to a certain topic? One organic way of doing it is to consider only the documents that have the most probability for belonging to said topic, but this is, in practice, leaving us with few "top 10 references".
Perhaps we could, for each topic, sort the probability over documents in some way?