nasa-petal / PeTaL-labeller

The PeTaL labeler labels journal articles with biomimicry functions.
https://petal-labeller.readthedocs.io/en/latest/
The Unlicense
6 stars 3 forks source link

Look into using a relevancy threshold vs. top k for labelling #55

Closed bruffridge closed 3 years ago

bruffridge commented 3 years ago

Brandon: I'm not sure what the distribution of labels looks like in our dataset, but suppose we have papers with only 2 labels, and some with 10 or more. Instead of labelling using "top k", would it be better to label using "relevancy > some threshold" to account for the variation in number of labels?

Eric: This idea has occurred to me too, because I'm not sure how else to get away from a fixed short-ranking list length. I haven't looked into the raw logits that MATCH produces for each label, but I hope to get to this soon!

So in effect we'd be trading away one hyperparameter (k) for another (the threshold)

bruffridge commented 3 years ago

Here's MATCH's precision-recall curve on cleaned_lens_output.json:

match_prc

David Smith — 06/30/2021 Is that saying P@1 could be around 80%? Or is it extrapolated Eric Kong — 06/30/2021 It's been hard to optimize threshold (because I'm not sure what to optimize it over) but at threshold = 0.5 for example, we get an average of ~3 labels, P@3 is around 0.5, and R@3 is around 0.37 Eric Kong — 06/30/2021 RE: Is that saying P@1 could be around 80%? At that extreme, I think threshold is really high (0.9999) so it seldom predicts anything (but when it does, those labels are targets 80% of the time)

This is how precision, recall and F1 score vary with threshold (all scores were from 0 to 1, so the threshold sweeps across that range)

match_threshold_vs_precision_recall_and_f1