Inquiry about experiment results

LeaLiLu commented 4 years ago

Hi, As mentioned in your paper that you used Macro-averaged scores, and the reported experiment results of present keyphrase prediction of catSeqD model on kp20k dataset is 0.285 (F1@5 metric) .

When I ran the catSeqD model with your code, I got a similar Macro-averaged score with you of 0.286 on the metric of F1@5, and the Micro-averaged score of 0.270.

But according the results reported in the paper One Size Does Not Fit All: Generating and Evaluating Variable Number of Keyphrases https://arxiv.org/abs/1810.05241 It seems that they used the Micro-averaged score.

And they got the score of 0.348 of catSeqD model on the metric of F1@5.

I am confused about the different results of the same model on the same dataset. Is there anything wrong with this comparison ?

kenchan0226 commented 4 years ago

As mentioned in Section 6.2 of our paper, the implementation of F1@5 in our paper is different from Yuan et al. 2018. See the below screenshot for the reason.

LeaLiLu commented 4 years ago

Thanks!

kenchan0226 / keyphrase-generation-rl

Inquiry about experiment results #11