Why you are only using top-k predictions in the scoring.py?

binxuan commented 7 years ago

This will provide additional information to the classifier that there are only k labels we want to get. Hence the F1 score is no longer a fair evaluation.

GTmac commented 7 years ago

This experimental setting is in line with a previous paper on graph embeddings: L. Tang and H. Liu. Relational learning via latent social dimensions. In KDD’09, pages 817–826, 2009. According to their paper:

In our experiments, actors might have more than one label. Since most methods yield a ranking of labels rather than an exact assignment, a thresholding process is normally required. It has been shown that different thresholding strategies lead to quite different performance [7, 31]. To avoid the affection of thresholding, we assume the number of labels on the test data are already known and check how the top-ranking predictions match with the true labels.

Under this setting, micro-F1 and macro-F1 scores should still be proper evaluation metrics.

binxuan commented 7 years ago

ok, thanks for reply

phanein / deepwalk

Why you are only using top-k predictions in the scoring.py? #35