Add a description of metrics to MATCH's README

You can use the description you gave in Discord. Although if you can better describe nDCG in the context of our dataset and multilabel classification, and what information it gives us, as I'm still not sure.

So MATCH produces a ranking of labels (biomimicry functions) by their relevance. There are a lot of labels, but usually only a few are relevant to each document. Precision at top k (P@k) asks "Of the top k labels predicted by MATCH, how many is the document actually (ground-truth) tagged with"?

P@k has a shortcoming in that it is not ranking-aware -- it just checks, one by one, whether each predicted label is also a ground-truth label (whereas in reality, some labels can be more relevant than others!). Normalized Discounted Cumulative Gain at top k (nDCG@k) is one way to address this issue, by computing the similarity of MATCH's generated ranking to an ideal ranking.

Both P@k and nDCG@k range from 0.0 (completely off the mark) to 1.0 (picture-perfect).

My understanding is the relevancy score for each ground-truth label is a binary yes or no (1 for relevant, 0 for not).

So if a prediction at top 3 is (NR, R, R) and another is (R, NR, R) (where R = a relevant label and NR is a non-relevant label), those two predictions would have different nDCG scores. In computing the nDCG score they would both be compared to the ideal partial ordering (R, R, NR).

nasa-petal / PeTaL-labeller

Add a description of metrics to MATCH's README #77