memray / OpenNMT-kpg-release

Keyphrase Generation
MIT License
216 stars 34 forks source link

MAGKP dataset #55

Closed ahadda5 closed 2 years ago

ahadda5 commented 2 years ago

I do not see the magpKP anywhere. It is mentioned in the 2021 NACLL "An empirical study on NEural Keyprahse Generation" paper .

That is not found in the data folder. There exists the train db kp20 and stackex and the rest of the valid dbs.

Should one ignore it?!

memray commented 2 years ago

Hi @ahadda5 . Sorry for recent slow updates. The paper is basically done and I will update the rest resources shortly, including the MagKP dataset. But you may consider process the data yourself (download MAG/OAG dump here and process it with this script).

I used the v1 version MAG Papers, (9 zip files, like mag_papers_0.zip), which is smaller but more accessible.

ahadda5 commented 2 years ago

Wow very kind of you! thanks Rui

ahadda5 commented 2 years ago

also quick question, on HuggingFace, i see the individual models (.pt) each trained on a certain dataset (kp20,openkp etc..) , however was there is no one trained on all? (in accordance with your finding "large and noisy datasets can benefit KPG" ?

Thank you

memray commented 2 years ago

No, I didn't try training them all together. Maybe you can give it a try :D