memray / OpenNMT-kpg-release

Keyphrase Generation
MIT License
216 stars 34 forks source link

Is this project still working? #42

Closed haseeb33 closed 2 years ago

haseeb33 commented 2 years ago

I am having many issues.

  1. In newer torchtext (torchtext.data -> torchtext.legacy.data)
  2. All the config files have data folder something like data/keyphrase/------ but original data uploaded on the drive has a different flow
  3. Due to a huge number of config files, it's very confusing where should I start.

Any comments from the developers would be highly appreciated.

memray commented 2 years ago

Hi @haseeb33,

Sorry for being late. I'm still maintaining it but may not respond in time. I updated the code, mostly about integrating the new pre-processing pipeline from OpenNMT v2, so you can directly load JSON data from disk, without the need of processing it to tensor files first. You can find some config examples here using word tokenization (vocab download) and here using RoBERTa subtokens, for both RNN and Transformer. Hope this helps.

Thanks, Rui

haseeb33 commented 2 years ago

The shared vocab is in .pt format whereas the new implementation requires vocab in JSON format.

Please guide me in this regard! Thanks!

memray commented 2 years ago

You are right. Check out the json file in this Google Drive link.