hiroki13 / span-based-srl

46 stars 9 forks source link

Training on OntoNotes 5.0 with ELMO embeddings #7

Open drussellmrichie opened 2 years ago

drussellmrichie commented 2 years ago

Has anyone managed to train this recently (or just have access to a pre-trained model...)? I do have ontonotes-release-5.0_LDC2013T19.tgz from the LDC. When I unzip it, the contents do basically look like what is suggested by Step 1 here. But I'm lost by Step 2 -- where are the training, development and test archives? Further, the repo readme suggests that I will need a training set file like path/to/conll2005.train.txt, but the instructions on here don't seem to involve creating a file like that...?

hiroki13 commented 2 years ago

In Step 2, conll-formatted-ontonotes-5.0 (downloadable at https://github.com/ontonotes/conll-formatted-ontonotes-5.0/releases/tag/v12 ) should be used. The file contains train, dev and test archives.

drussellmrichie commented 2 years ago

Thanks for this. Okay, I've got the files from there now, but Step 3 says "Download the scripts from the following location", but then doesn't provide a link. Or, are those scripts supposed to be in the "conll-formatted-ontonotes-5.0" that was downloaded? If so, I don't see them there:

(base) [richier@reslnapollo01 ~]$ tree -L 5 conll-formatted-ontonotes-5.0-12/
conll-formatted-ontonotes-5.0-12/
├── conll-formatted-ontonotes-5.0
│   └── data
│       ├── conll-2012-test
│       │   └── data
│       │       └── english
│       ├── development
│       │   └── data
│       │       └── english
│       ├── test
│       │   └── data
│       │       └── english
│       └── train
│           └── data
│               └── english
└── README.md

14 directories, 1 file

Also, is the code you provide in your readme for training the ensemble models?

Thanks again for your help. Really appreciate it!!!

drussellmrichie commented 2 years ago

Okay, I think I managed to find the scripts here, which may be helpful for others trying to train this model:

https://github.com/yuchenlin/OntoNotes-5.0-NER-BIO/tree/master/conll-formatted-ontonotes-5.0/scripts

I am currently running these and they do seem be working as intended: producing .gold_conll files in train, dev, and test folders (or at least train right now -- it's still running).

Still not sure how to make conll2005.train.txt, though. It's not just a text file concatenating all of the *.gold_conll files in train, is it ? (And something analogous for dev and test)

Also, I did install AllenNLP, but where would I get elmo.conll2005.train.hdf5?

MrSingh-bytes commented 2 years ago

Hi, Have you done with this? Like you trained it and got the result? If you have done, I would appreciate if you help me. Thanks!!!

drussellmrichie commented 2 years ago

@mrsingh007143 No I don't think I ever did. I think I gave up and tried using other SRL models. :-(

MrSingh-bytes commented 2 years ago

@mrsingh007143 No I don't think I ever did. I think I gave up and tried using other SRL models. :-(

Thank you!!