Closed johncs999 closed 4 years ago
KP20k was collected by me and my colleagues from different sources, e.g. ACM, Wiley, ScienceDirect, Elsevier. All keywords are provided by original authors.
As for test datasets such as semeval/nus/inspec/krapivin, they are from previous studies (say this repo) and most of them (except for krapivin which is also author keywords) are annotated additionally. So models perform almost the same on KP20k and krapivin, but not on the others.
Hope this helps.
Hi, memray, can you give more details about the data source? (e.g. which web site does the abstracts in kp20k/semeval/.. come from? ) I find that the results of some test datasets (e.g. semeval, inspec) is relatively worse than others, I feel that there may be differences in data distribution. What do you think of this problem?