Closed thinkwee closed 3 years ago
Thank you for the question and sorry for the confusion. The stats in README is more up-to-date and reliable, since we further cleaned the data after the first paper. We know the way of separating present/absent may affect the final scores and thus we provide updated statistics in the README and latest papers.
For determining present/absent phrases, the method remains the same (tokenization, digit replacement, word matching etc.) and you can check out this notebook, which provides the complete pipeline for this purpose.
Replies to your specific questions: 1. the numbers reported in README is on testset only; 2. only in abstracts (though NUS and semeval have fulltext); 3. after stemming and lowercase.
Thanks, Rui
@memray
Best regards! I think stemming is just for increasing f1 score, not reflecting real keyphrase. And sentences became incorrect syntax. How do you think about?
Thanks.
Hi @qute012 ,
Yes it's true. But there are many trivial form variants causing phrases to fail to match. So I think the benefits outweigh the downsides. Also, usually generation models are capable of predicting well-formed phrases (way better than extractors based on POS tag). So stemming is pretty useful in evaluations, for now.
Best, Rui
@memray
Thanks to kind words!
In my case, i'm using extractive model with pretrained language model. In this case, i guess it's important to feed syntactic sentences to pretrained language model such as Bert. But i agree it's better idea with generative model. Is this a valid idea?
@qute012 Oh I think I got your point. Do you mean to stem the input sentences? No, the input to the model is not stemmed. The stemming is only applied in evaluation. For example, given a few ground-truth keyphrases ['computers', 'calculations'], it is okay for the model to generate 'computer', 'compute', 'comput', 'calculat' etc., they all will be treated as correct predictions since they can match ground-truth phrases after stemming. But duplicate predictions are ignored in the evaluation, see here.
Oh, @memray I got it! It' just for evaluation. Finally, I understand why this is helpful.
Thanks 👍
Hi~ I see in your paper Deep Keyphrase Generation you give proportion of the present keyphrases and absent keyphrases in four public datasets: This result is different from the table in README. I also calculated the proportion based on the data you provided and get another number. I want to know how to get the correct result and how you defined present/absent keyphrase(only on testset? only present in abstract? after stemming?). Thank you !