memray / seq2seq-keyphrase

MIT License
318 stars 109 forks source link

Data Gathering #27

Closed ygorg closed 5 years ago

ygorg commented 5 years ago

Hi, thank you for sharing your code and the data you gathered. I was wondering if were planning on publishing an article describing the kp20k dataset (as it is (to my knowledge) the biggest available dataset for keyphrase extraction), and how, where and when did you collect the data? Are the gold keyphrases created by authors or users ?

memray commented 5 years ago

Hi @ra1nbowpill ,

Thank you for your interest. Yes, the gold keyphrases are from authors. I just collected/crawled metadata of scientific papers from different digital libraries (ACM, Wiley etc.) a few years ago. I think it might be hard to do outside the campus (I guess only university/research institute buy these digital resources).

Thanks, Rui