Closed binxuan closed 7 years ago
This experimental setting is in line with a previous paper on graph embeddings: L. Tang and H. Liu. Relational learning via latent social dimensions. In KDD’09, pages 817–826, 2009. According to their paper:
In our experiments, actors might have more than one label. Since most methods yield a ranking of labels rather than an exact assignment, a thresholding process is normally required. It has been shown that different thresholding strategies lead to quite different performance [7, 31]. To avoid the affection of thresholding, we assume the number of labels on the test data are already known and check how the top-ranking predictions match with the true labels.
Under this setting, micro-F1 and macro-F1 scores should still be proper evaluation metrics.
ok, thanks for reply
This will provide additional information to the classifier that there are only k labels we want to get. Hence the F1 score is no longer a fair evaluation.