juanrloaiza / latinamerican-philosophy-mining

Text mining philosophy journals in Latin America.
0 stars 2 forks source link

Better way to get "top 10 references" #11

Closed miguelgondu closed 1 year ago

miguelgondu commented 2 years ago

How do we define if a document belongs to a certain topic? One organic way of doing it is to consider only the documents that have the most probability for belonging to said topic, but this is, in practice, leaving us with few "top 10 references".

Perhaps we could, for each topic, sort the probability over documents in some way?

miguelgondu commented 2 years ago

One way is to do it is to sort by likelihood. If we assume a uniform prior over documents, we can essentially normalize and account for 50% of the "likelihood mass".

miguelgondu commented 2 years ago

TODO: Depending on the results of the gridsearch, we can finally decide on whether this is a good criteria or not.

juanrloaiza commented 1 year ago

We decided on likelihood mass (or something along those lines which Miguel know's about). Closing.